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Description 

Detection Of Mammary Tumor Virus-Like 
Sequences In Human Breast Cancer 

Cross-Reference to Related Application 

This application is a continuation-in-part 
application of U.S. Serial No. 08/555,394, filed 
November 9, 1995. 

Statement Regarding Federally Sponsored Research 
5 This invention was made with funds from the U.S. 

government, which has certain rights in the invention. 

Introduction 

The present invention relates to materials and 
methods for diagnosing breast cancer in humans. It is 

10 based, at least in part, on the discovery that a 

substantial percentage of human breast cancer tissue 
samples contained nucleic acid sequences corresponding 
to a portion of the mouse mammary tumor virus env gene. 
In contrast, such sequences were absent in almost all 

15 other human tissues tested. 

Background of the Invention 

A large body of information has accumulated about 
the molecular biology of MMTV (reviewed in Slagle, 
B. L. et al., 1987, in "Cellular and Molecular Biology 

20 of Mammary Cancer", Kidwell et al., eds., Plenum Press, 
NY. pp 275-306) . Mouse mammary tumor virus (MMTV) is 
associated with a high incidence of breast cancer in 
certain strains of mice (over 90% among females) , and 
has been regarded as a potential model for human 

25 disease. 

The MMTV virus does not carry a transforming 
oncogene, but rather acts as an insertional mutagen 
with several proviral insertion loci designated int-1 
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or wnt-1 (Nusse R. et al., 1982, Cell 31:99-109) int-2 
(Peters, G. et al., 1983, Cell 33:369-377) int-3 
(Gallahan, D. et al., 1987, J. Virol. 61:218-220) int-4 
(Roelink, H. et al., 1990, Proc. Natl. acad. Sci. USA 
5 87:4519-4523) and int-5 (Morris, V.L., et al. 1991, 
Oncogene Research 6:53-63), which encode for growth 
factors or other related proteins. These genes are not 
expressed in normal mammary tissue but become activated 
after integration of MMTV provirus into the adjacent 

10 chromosomal DNA. 

The human homolog of the int-2 locus has been 
located on chromosome 11 (Casey, G. et al. , 1986, 
Mol. Cell Biol. 6:502-510) and has been found amplified 
(in 15% of the breast cancers) and also expressed 

15 (Lidereau, R. et al., 1988, Oncogene Res 2:285-291; 

Zhou, D.J. et al. , 1988, Oncogene 2:279-282; Liscia, 
D.S. et al., 1989, Oncogene 4:1219-1224; Meyers, S.L. 
et al., 1990, Cancer Res 50:5911-5918). It may be 
significant that in tumors from Parsi women, who have a 

20 high incidence of breast tumors, the int-2 locus is 
amplified in 50% of the cases ( Barnabas-Sohi , N. et 
al., 1993, Breast Dis. 6:13-26). The amplification of 
int-2 and other genes in llql3 is indicative of poor 
prognosis (Schuwring, E. et al., 1992, Cancer Research 

25 52:5229-5234; Champeme, M-H, et al., 1995, Genes, 

Chromosomes and Cancer 12:128-133). Both mouse and 
human int-2 have been sequenced (Moore, R. et al., 
1986, EMBO J 5:919-924). The gene encodes a protein of 
about 27 kilodaltons (KD) which shows homology to both 

30 basic and acidic fibroblast growth factors (Dickson, C. 
et al. 1987, Nature (London) 326:833). 

However, efforts to demonstrate the presence of 
viruses in human breast cancer through search for viral 
particles, immunological cross-reactivity, or sequence 

3 5 homology have yielded contradictory results. Detect- 
able MMTV env gene-related antigenic reactivity 
has been found in tissue sections of breast cancer 
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(Mesa-Tejada et al., 1978, Proc. Natl. Acad. Sci. USA 
75:1529-1533; Levine, P. et al., 1980, Proc. Am. Assoc. 
Cancer Res. 21:170; Lloyd, R. et al. , 1983, Cancer 
51:654-661), breast cancer cells in culture (Litvinov, 
5 S.V. and Golovkina, T.V. , 1989, Acta Virologica 33:137- 
142), human milk (Zotter S. et al., 1980, Eur. J. 
Cancer 16:455-467) in sera of patients (Day, N.K. 
et al., 1981, Proc. Natl. Acad. Sci. USA 78:2483-2487), 
in cyst fluid (Witkin, S.S. et al., 1981, J. Clin. 

10 Invest. 67:216-222) and in particles produced by a 

human breast carcinoma cell line (Keydar, I. et al., 
1984, Proc. Natl. Acad. Sci. USA 81:4 188-4 192). 
Sequence homology to MMTV has been found in human DNA 
under low stringency conditions of hybridization 

15 (Callahan, R. et al., 1982, Proc. Natl. Acad. Sci. USA 
79:5503-5507) and RNA related to MMTV has been detected 
in human breast cancer cells (Axel, R. et al., 1972, 
Nature 235:32-36). The presence of MMTV related 
sequences in lymphocytes from patients with breast 

20 cancer has been reported (Crepin, M. et al., 1984, 

Biochem. Biophys. Res. Comm. 118:324-33 1), as well as 
detection of reverse transcriptase (RT) activity in 
their monocytes ( Al-Sumidaie, A.M. et al., 1988, Lancet 
1:5-8). May and Westley (May and Westley, 1989, Cancer 

25 Research 49:3879-3883) have reported the presence of 

MMTV-like sequences arranged as tandem repeats only in 
DNA from breast cancer cells. 

These results have been difficult to interpret, 
and theories linking MMTV or a related virus with human 

30 breast cancer have fallen out of favor, in view of the 
relatively recent discovery of human endogenous 
retroviral sequences ("HERs"; Westley, B. et al., 1986, 
J. Virol. 60:743-749; Ono, M. et al., 1986, J. Virol. 
60:589-598; Faff, O. et al. , 1992, J. Gen. Virology 

35 73:1087-1097). Data which could be interpreted to 
demonstrate the presence of MMTV-related sequences 
could be more readily explained by endogenous human 
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retroviral sequences. Adding further confusion to the 
picture, env-gene related antigenicity has been 
detected in epitopes of human proteins (Hareuveni, M. 
et al., 1990, Int. J. Cancer 46:1134-1135). 

5 Brief Summary of the Invention 

The present invention relates to methods for 
diagnosing breast cancer in humans in which the 
presence of mouse mammary tumor virus env gene-like 
sequences bears a positive correlation to the existence 

10 of malignant breast disease. It is based, at least in 
part, on the discovery that 3 8 to 4 0 percent of human 
breast cancer tissue samples tested contained gene 
sequences homologous to the mouse mammary tumor virus 
env gene that are substantially absent from other human 

15 tumors and tissues. The invention also relates to 

methods for diagnosing breast caner in humans in which 
the presence of retrovirus proviral fragments substan- 
tially homologous to the env gene and/or 3' LTR 
sequence of MMTV are detected. The molecular probes 

20 used in these experiments were designed to avoid cross- 
hybridization with endogenous human retroviral 
sequences. The present invention further provides for 
compositions of molecular probes which may be utilized 
in such diagnostic methods. 

2 5 Brief Description of the Figures 

FIGURE 1 ; Amplification of 660 bp of MMTV-like 
env gene. DNA was extracted from frozen tissues. PGR 
was performed using primers 1 and 3. A: 2% agarose 
gel electrophoresis. B: Southern blot hybridization 

30 using 5 ' 32 P-end-labeled probe 2. Lanes 1 and 3: breast 
cancer; lanes 2 and 4: normal breast; lane 5: control 
reaction (no DNA) ; lane E: MMTV env gene. M: molecular 
weight marker. Arrow indicates 510 bp band. 

FIGURE 2 : Nested PCR. A: 2% agarose gel electro- 

35 phoresis. 1: Amplification of 686 bp of MMTV-like env 
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gene sequences using primers 1 and 4 and the product, of 
reaction A 1 as template. 2: Amplification of 250 bp 
of MMTV-like env gene sequences using primers 2 and 3. 
B, 1 and 2: Southern blot hybridization of the ampli- 
5 fied products using probe 5'- 32 P end-labeled probe 2a. 

FIGURE 3 : Amplification of 250 bp of MMTV-like 
env gene. DNA was extracted from paraffin-embedded 
tissue sections. PCR was performed using primers 2 and 
3. A: 2% agarose gel electrophoresis. B: Southern blot 

10 hybridization using 5 '~ 32 P-labeled probe 2a. Lane 1: 
normal breast; lanes 2 to 5: breast cancer; lane E: 
MMTV env gene. M: molecular weight marker. Arrow 
indicates 298 bp band. 

FIGURE 4 : Nucleotide sequence of the cloned MMTV 

15 env gene-like sequences as compared to the env 

sequences of the GR and BR6 strains of MMTV using thie 
GCG program. *:potential glycosy lat ion site, | : mismatch 
to MMTV. 

FIGURE 5 : Southern blot hybridization of genomic 
20 DNA . DNA was extracted from frozen tissues or cell 
lines, digested with EcoRl and transferred to 
nitrocellulose paper. Hybridization with 32 P-labeled 
clone 166. DNA from A, B, and G: env gene positive 
breast cancer; C and D: env negative breast cancer; 
25 E and F: normal breast; H:MCF-7 cells. M: molecular 
weight marker, Arrow indicates 9kb band. 

FIGURE 6 ; Southern blot hybridization of genomic 
DNA. Experimental conditions as in Fig. 5. DNA from 
A and B: env negative breast cancer; C and D: env 
30 positive breast cancer; E: molecular weight marker 

(non-labelled) ; F. to H: normal breast. Arrow indicates 
position of 9 kb marker. 

FIGURE 7 : Map of MMTV. 

FIGURE 8 : Comparison of the nucleic acid sequence 
35 of mouse mammary tumor env gene ("MMTENV") , showing 

residues 976-1640, with the nucleic acid sequence of a 
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representative 660 bp sequence obtained by PCR reaction 
of DNA from human breast cancer tissue ("MS1627"). 

FIGURE 9 ; Sequence of an about 2 . 6 kb MMTV-like 
fragment detected in a human breast carcinoma. 

5 Detailed Description of the Invention 

The present invention relates to methods and 
compositions for diagnosing breast cancer in humans. 

The present invention provides for compositions 
comprising an isolated and purified nucleic acid 

10 molecule which (i) hybridizes to a gene of mouse 
mammary tumor virus; (ii) is present in at least 
2 0 percent of DNA samples prepared from breast cancer 
tissue of different human subjects; and (iii) is 
present in less than 5 percent of DNA samples prepared 

15 from tissues other than breast cancer tissue from 

different human subjects. A "gene of mouse mammary 
tumor virus" includes, but is not limited to, the gag , 
pol . and env genes and the 5' LTR and 3' LTR sequences 
of MMTV. In preferred embodiments of the invention, 

20 the mouse mammary tumor virus (hereafter "MMTV") gene 
is the env gene and/or the 3' LTR sequence. The term 
"hybridize" is used to refer to routine DNA-DNA or DNA- 
RNA hybridization techniques under what would be 
regarded, by the skilled artisan, as stringent 

25 hybridization conditions. The phrase "is present" 

indicates that a native form of the molecule, in an 
unpurified state (for example, as part of chromosomal 
DNA) , may be detected by a standard laboratory 
technique, such as Southern blot or polymerase chain 

30 reaction (PCR). To be "present", the molecule may be 
detectable by one technique but not others. To be 
present in "less than 5 percent of DNA samples prepared 
from tissues other than breast cancer tissue from 
different human subjects", all non-breast cancer tissue 

3 5 samples are considered together, but the total number 
of samples must be large enough to give the 5 percent 



BNSDOCID:<WO 9717470A1> 



WO 97/17470 PCT/US96/17877 

7 

value statistical significance that would be reasonable 
to the skilled artisan. 

In order to identify such a nucleic acid molecule, 
the sequence of MMTV may be compared, using a computer 
5 database, to known human DNA sequences, and portions 
of MMTV which are less than or equal to 2 5 percent 
homologous to a human sequence may be selected for 
further study. The term "homologous", as used herein, 
refers to the presence of identical residues; for 

10 example, a first sequence is considered 25 percent 

homologous to a second sequence if it shares 25 percent 
of the residues of the first sequence. Since there is 
relatively greater likelihood that MMTV may bear 
similarity to human retroviral-like sequences, it may 

15 be preferable to evaluate whether a particular MMTV 

nucleic acid sequence is homologous to such sequences, 
for example, as endogenous human retrovirus sequences. 
A prototype of such viruses is HERV-K10 (Ono, M. 
et al., 1986, J. Virol. 60:589-598). 

20 Once an MMTV gene sequence which is less than or 

equal to 25 percent homologous to a human DNA sequence, 
such as a human endogenous retroviral sequence, is 
identified, the presence of nucleic acid molecules 
having the MMTV gene sequence in human breast cancer 

25 tissues and other tissues may be evaluated. Such 

evaluations may be performed either by Southern blot 
techniques, or, preferably, by polymerase chain 
reaction (PCR) techniques, which are more sensitive. 
In such a way, MMTV gene sequences which (i) hybridize 

3 0 to at least 20 percent of DNA samples prepared from 
breast cancer tissue of different human subjects and 
(ii) hybridize to less than 5 percent of DNA samples 
prepared from human tissues other than breast cancer 
tissues may be identified. A nucleic acid molecule 

3 5 having a MMTV gene sequence which satisfies these 

requirements may then be used in diagnostic methods 
which detect the presence of such sequence in human 
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breast tissue by standard techniques, including PCR 
techniques which assay for the presence of the 
molecule, but also, where appropriate, Southern blot, 
Northern blot, or Western blot techniques, to name but 
5 a few* 

In preferred embodiments, the present invention 
relates to a portion of MMTV localized between MMTV env 
gene sequences 976 and 1640 (Majors, I.E. and Varmus, 
H.E., 1983, J. Virol. 47:495-504; see Fig. 7). This 

10 about 660 bp sequence (hereafter, "the 660 bp 

sequence") has been found to exhibit low (16 percent) 
homology to the prototype human endogenous retrovirus 
HERV-K10, using the IBI/Pustell Sequence Analysis 
Program, and has also been shown to be present in 121 

15 (38.5%) of 314 unselected breast cancer tissue samples, 
in cultured breast cancer cells, in 2 of 29 breast 
fibroadenomas (6.9%) and in 2 of 107 breast specimens 
from reduction mammoplasties (1.8%). The sequence was 
not found in normal tissues including breast, lympho- 

20 cytes from breast cancer patients nor in other human 
cancers or cell lines (see example section, infra ) . 
Similarly, an about 250 bp sequence (hereafter "the 
250 bp sequence") , between positions 1388 and 1640 in 
the env gene, and therefore falling within the 660 bp 

25 sequence, was detected in 60 (39.7%) of 151 breast 

cancer, and in one of 27 normal breast samples assayed 
from paraffin-embedded sections. Cloning and sequenc- 
ing of the 660 bp and 250 bp sequences demonstrated 
that they are 95-99% homologous to MMTV env gene, but 

30 not to the known human endogenous retroviruses ("HERs" ) 
nor to other viral or human genes (<18%) . 

In another preferred embodiment, the present 
invention relates to a a nucleic acid molecule which 
corresponds to a retroviral genomic fragment which has 

35 substantial homology to 3' LTR and/or env gene of the 
MMTV genome, and is found in a substantial percentage 
of breast cancer samples. By substantial percentage is 
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meant at least 20% of tested breast cancer samples. 
Such a sequence is preferably comprised of the 3' LTR 
region and all or part of the env gene, although it may 
include more sequences of a retroviral genome. Most 
5 preferably, the sequence is at least comprised of an 
about 2.6 kb fragment which comprises the 1,228 base 
pair (bp) sequence of the 3' LTR sequence and 1,336 bp 
of the env gene sequence of MMTV (Fig. 9) (SEQ ID 
NO: 20). When compared with the two strains of MMTV C3H 

10 and BR6, the sequence homology was 90.8% and 90.7%, 
respectively. When compared with the endogenous 
retroviral sequences (HUMERKA) , sequence homology was 
only 58% in 36 bp and 71% in 74 bp. 

Retrovirus proviral sequences can be detected by 

15 PCR technology using primers derived from the MMTV 

genome. Such primers include primer 5L, containing the 
nucleotides 7376-7395 of the MMTV BR6 genome (5'-3'r 
CCAGATCGCCTTTAAGAAGG) (SEQ ID NO: 11) and primer LTR 3 , 
containing nucleotides 9918-9927 of the MMTV BR6 genome 

20 (5'-3': CGAACAGACACAAAGCGACG) (SEQ ID N0:19). Other 
primers which correspond to or are homologous to MMTV 
sequences can be used as primers. Nucleotide fragments 
which correspond to or are homologous to the retroviral 
sequences isolated from the breast cancer samples can 

25 also be used to amplify additional retroviral fragments 
from the samples. Long PCR techniques can be used to 
amplify longer stretches of a proviral sequence. 

The present invention provides for compositions 
comprising an isolated and purified nucleic acid 

3 0 molecule which hybridizes to the about 2.6 kb 

retroviral fragment shown in Fig. 9 under stringent 
conditions or is at least 90 percent homologous to said 
fragment using the MacVector homology determining 
program which may be used to diagnose breast cancer in 

3 5 a subject, using methods which include PCR and Southern 
blot methods. 
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Nucleic acids having the 660 bp sequence, the 
250 bp sequence, or all or part of the about 2.6 kb 
sequence, may therefore be used, according to the 
invention, to diagnose breast cancer in a subject, 
5 using methods which include PCR and Southern blot 

methods. Where PCR methods are used, primers such as 
those listed in Table 1, below, may be utilized. 

The present invention provides for compositions 
comprising essentially purified and isolated nucleic 

10 acid having the 660 bp sequence or the 2 50 bp sequence 
or an at least five bp, and preferably greater than or 
equal to ten bp, subsequence thereof. In order to 
maintain the desired specificity, such nucleic acid 
molecules may preferably contain sequence falling 

15 within the 660 bp sequence, but preferably do not 
contain sequences from other portions of the MMTV , 
genome, which may, undesirably, hybridize to human 
sequences which are not breast cancer specific, such as 
HERs . Accordingly, the present invention provides for 

20 compositions wherein the isolated and purified nucleic 
acid molecule comprises at least a portion having a 
nucleic acid sequence which hybridizes to a region of 
the mouse mammary tumor virus env gene between residues 
976 and 1640, or between residues 1388 and 1640, and 

25 wherein the isolated and purified nucleic acid molecule 
does not hybridize to any other region of the MMTV 
genome . 

The 660 bp sequence, in various embodiments, may 
have a number of nucleotide sequences. For example, in 

3 0 one embodiment, the 660 bp sequence may have a sequence 
as set forth in Fig. 8 and designated "MMTENV- 1 ike 
sequence" (SEQ ID NO: 17), which depicts the MMTV env 
sequence between residues 976 and 1640. In a second 
series of embodiments, the 660 bp sequence may have a 

3 5 sequence as set forth in Fig. ' 8 and designated "MS1627" 
(SEQ ID NO: 18), which depicts a predominant sequence 
for the 66 0 bp sequence as it has been defined by 



BNSDOCID: <WO 9717470A1> 



WO 97/1 7470 PCT/US96/1 7877 

11 



sequencing analysis of the products of PCR reactions 
using DNA from human breast cancer tissues. In still 
further embodiments, the 660 bp sequence may have 
various other nucleotide sequences obtained by 
5 sequencing the results of PCR reactions to detect the 
presence of 660 bp sequence in human breast cancer 
tissues. 

In related embodiments, the present invention 
provides for compositions comprising PCR primers 
10 that may be used to detect the presence of the 

f orementioned molecules or other MMTV-like sequences. 
For example, the compositions may comprise one or more 
of the following primer molecules (5' - 3'): 
CCTCACTGCCAGATC (SEQ ID NO:l); GGGAATTCCTCACTGCCAGATC 
15 (SEQ ID NO: 2); CCTCACTGCCAGATCGCCT (SEQ ID NO: 3); 

TACATCTGCCTGTGTTAC (SEQ ID NO: 4); CCTACATCTGCCTGTGTTAC 
(SEQ ID NO: 5); CCGCCATACGTGCTG (SEQ ID NO: 6); 
ATCTGTGGCATACCT (SEQ ID NO : 7 ) ; GGGAATTCATCTGTGGCATACCT 
(SEQ ID NO: 8); ATCTGTGGCATACCT AAAGG (SEQ ID NO: 9); 
20 GAATCGCTTGGCTCG (SEQ ID NO: 10); CCAGATCGCCTTTAAGAAGG * 
(SEQ ID NO: 11); TACAGGTAGCAGCACGTATG (SEQ ID NO:12); 
CGAACAGACACAAACACACG (SEQ ID NO: 19). 

The use of such compositions and molecules in PCR 
and Southern blot techniques is illustrated in the non- 
25 limiting examples set forth below. The correlation 

between the presence of the MMTV-related nucleic acid 
molecules described above and breast cancer allows such 
molecules and compositions to be utilized in the 
diagnosis of breast cancer. Accordingly, the present 
3 0 invention provides for a method of diagnosing breast 
cancer, wherein the detection of such nucleic acid 
molecules bears a positive correlation to the existence 
of breast cancer in a human. The results of such 
evaluation, together with additional clinical symptoms, 
3 5 signs, and laboratory test values, may be used to 
formulate the complete diagnosis of the patient. 
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In further related embodiments, the present 
invention provides for an essentially purified peptide 
encoded by a nucleic acid molecule which (i) hybridizes 
to a gene of MMTV; (ii) is present in at least 
5 2 0 percent of DNA samples prepared from breast cancer 
tissue of different human subjects; and (iii) is 
present in less than 5 percent of DNA samples prepared 
from tissues other than breast cancer tissue from 
different human subjects. In preferred embodiments, the 

10 MMTV gene is the env gene. 

Such peptides may be used in the diagnosis of 
breast cancer. Accordingly, the present invention 
provides for a method of diagnosing breast cancer in 
a human subject, comprising detecting the presence of 

15 a peptide encoded by a nucleic acid molecule which 

(i) hybridizes to the env gene of a mouse mammary tumor 
virus; (ii) is present in at least 20 percent of DNA 
samples prepared from breast cancer tissue of different 
human subjects; and (iii) is present in less than 

20 5 percent of DNA samples prepared from tissues other 
than breast cancer tissue from different human 
subjects ♦ 

The present invention also provides for antibodies 
(including monoclonal and polyclonal) antibodies which 

25 specifically bind to such peptides. Such antibodies may 
be used in methods of diagnosing breast cancer, for 
example, but not by way of limitation, by Western blot, 
immunof luorescent techniques, and so forth. 

In nonlimiting embodiments of the invention, the 

3 0 skilled artisan may evaluate MMTV-like nucleic acid 

molecules for regions which would be considered likely 
to encode immunogenic peptides (using, for example, 
hydropathy plots). Such peptides may then be sequenced 
and used to produce antibodies that may be employed in 

3 5 diagnostic methods as set forth above. 

For example, certain peptides encoded by portions 
of the 660 bp sequence have been synthesized. These 
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peptides, which have the sequences LKRPGFQEHEMI (SEQ ID 
NO:13) and GLPHLI DI EKRG (SEQ ID NO:14), have been used 
to produce antibodies in rabbits, and the resulting 
antisera have successfully identified breast cancer 
5 cells positive for MMTV env-like sequences by PGR 

assay. Other peptides encoded by 660 bp sequence which 
may be useful according to the invention include 
TNCLDSSAYDTA (SEQ ID NO: 15) and DIGDEPWFDD (SEQ ID 
NO: 16) . 

10 6. Example: The Detection of Mouse Mammary Tumor 

Virus Env Gene-Like Sequences in Human Breast 
Cancer Cells and Tissues 

6.1. Materials and Methods 
DNA from breast cancer tissue and other human 

15 cancer tissues, human placentas, normal human tissues 
including breast, and from several human cell lines 
(including eight breast cancer cell lines) , and two 
normal breast cell lines was extracted following the 
procedure of Delli Bovi et al. (1986, Cancer Res. 

20 46:6333-6338). The DNA was resuspended in a solution 
containing 0.05 M Tris HC1 buffer, pH 7.8, and 0 . 1 mM 
EDTA, and the amount of DNA recovered was determined by 
microfluorometry using Hoechst 33258 dye (Cesarone, C. 
et al., 1979, Anal Biochem 100:188-197). Plasmids 

2 5 containing the cloned genes of MMTV were obtained from 
the ATCC, propagated in Escherichia coli cultures and 
purified using anion-exchange minicolumns (Qiagen) or 
by precipitation with polyethylene glycol (Sambrook J., 
et al., 1989, in "Molecular Cloning/ A Laboratory 

30 Manual", Cold Spring Harbor). Oligonucleotide primers 
were synthesized at the core facilities of the 
Brookdale Molecular Biology Center at Mount Sinai 
School of Medicine. 

Polymerase chain reaction (PCR) was performed 

35 using Taq polymerase following the conditions 

recommended by the manufacturer (Perkin Elmer Cetus) 
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with regard to buffer, Mg 2+ and nucleotide concentra- 
tions. Thermocy cling was performed in a DNA cycler 
by denaturation at 94° c for 3 min. followed by either 
35 or 50 cycles of 94°C for 1.5 min., 50° C for 2 min. 
5 and 72 C C for 3 min. The ability of the PCR to amplify 
the selected regions of the MMTV env gene was tested by 
using as positive templates the cloned MMTV env gene 
and the genomic DNA of the MCF-7 cell line, since it 
was shown to express gp52 immunological determinants 

10 (Yang, N.S., et al., 1975, J. Natl. Cancer Inst. 

61:1205-1208). Optimal Mg 2+ , primer concentrations and 
requirements for the different cycling temperatures 
were determined with these templates. The master mix 
as recommended by the manufacturer was used. To detect 

15 possible contamination of the master mix components, a 
reaction without template was routinely tested. y DNA 
and control primers provided by the manufacturer were 
used as control for polymerase activity. As an 
internal control, amplification of a 120 bp sequence 

20 estrogen receptor gene was assayed using primers 

designed and generously provided by Dr. Beth Schachter, 
(Mount Sinai School of Medicine, N . Y . ) . In addition, 
primers for actin 5 gene amplification were also used. 
The product of the PCR was analyzed by electro- 

25 phoresis in a 2% agarose gel. A 1 kb DNA ladder (Gibco 
BRL) was used to identify the size of the PCR product. 
To determine if the amplified sequences of the middle 
region of the 660 bp faithfully reproduced the 
sequences of the env gene of MMTV, an 18-mer sequence 

30 within the env gene was used as a probe for the 660 bp 
amplified sequence. The 18-mer probe was 5' end- 
labeled with 32 P-ATP using T4 polynucleotide kinase and 
purified by the NENSORB nucleic acid purification 
cartridge (NEN) . Southern blot hybridization was 

3 5 performed using the conditions described by (Saiki 
et al.,1985, Science 230:1350-1354). 
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The product of the PCR (660 bp or 2 50 bp) was 
cloned directly from the reaction mixture into the TA 
cloning vector (Invitrogen) using the TA cloning kit 
and following the conditions recommended by the 
5 supplier. Direct cloning of the fragment isolated from 
the gel, was also performed. Plasmid DNA was purified 
by CsCl density gradient centrif ugation or by 
precipitation with polyethylene glycol (Sambrook 
et al., 1989 , in "Molecular Cloning/A Laboratory 

10 Manual", Cold Spring Harbor), restricted with Hindlll 
and EcoRl, electrophoresed in 2% agarose gels and 
transferred to nitrocellulose filters. Southern blot 
hybridization was carried out using a 5 '-terminal 
labeled internal probe as described above. Cloning 

15 procedures were performed in laboratories totally 

separate from those where PCR was carried out. Auto- 
mated DNA sequencing (using Applied Technology f 
Sequencer Model 373A) was performed in the Brookdale I 
Molecular Biology Center. Sequence homology was 

20 determined using the IBI MacVector GenBank and GCG ■«• 
Programs. * 
To prevent contamination of the samples, process- f 
ing of human tissues was performed in a laminar flow f 
hood. DNA extractions were done in a chemical hood 

25 located in a different room from that where PCR was 

performed, PCR assays were assembled in a biological 
hood provided with ultraviolet light. Aerosol 
resistant tips and dedicated positive-displacement 
pipettes were used throughout. All equipment used for 

3 0 PCR (microcentrifuge, electrophoresis apparatus, 
pipettors) was cleaned each time with 10% sodium 
hypochlorite to assure DNA decontamination (Prince and 
Andrus, 1992, Biotechniques 12:358-36). After the 
initial experiments were performed, the plasmid con- 

3 5 taining the MMTV env gene was frozen and never used 
again, to avoid contamination. However, to detect 
plasmid contamination from our own env gene clones, 
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primers were designed to amplify plasmid sequences. 
All the authentic MMTV env positive samples were then 
tested and found negative for plasmid contamination. 

Southern blotting and hybridization were performed 
5 as described (Southern, E.M. , 1975, J. Mol. Biol, 

98:503-517), using the 660 bp cloned sequences labeled 
by the random primer procedure (Feinberg, A. P., et al., 
1983, Anal. Biochem. 132:6-13). Prehybridization and 
hybridization were performed in a solution containing 
10 6 x SSPE, 5% Denhardt's, 0.5% SDS, 50% formamide, 

100 /Ltg/ml denaturated salmon testis DNA, incubated for 
18 hrs at 42 °C, followed by washings with 2 x SSC and 
0.5% SDS at room temperature and at 37 °C and finally in 

0. 1 x SSC with 0.5% SDS at 68 °C for 30 min (Sambrook 
15 et al., 1989, in "Molecular Cloning/A Laboratory 

Manual", Cold Spring Harbor). For paraffin-embedded 
tissue sections the conditions described by Wright and 
Manos (1990, in "PCR Protocols", Innis et al., eds., 
Academic Press, pp. 153-158) were followed using 
20 primers designed to detect a 250 bp sequence. 

6.2. Results 

6.2.1. Selection of Specific MMTV Env Gene Sequences 
A computer search for MMTV env gene homologous 
sequences was first performed, since sequence homology 

2 5 between the human endogenous retroviral sequences and 
MMTV had been described. The prototype of this group 
of human endogenous retroviruses is HERV-K10 (Ono, M. 
et al., 1986, J . Virol. 60:589-598). The sequences of 
the env gene of MMTV (Majors, I.E. and Varmus, H.E., 

30 1983, J Virol 47:495-504) were aligned with sequences 
of the env gene of the human endogenous retrovirus 
HERV-K10 (Ono, M. et al., 1986, J. Virol. 60:589-598), 
using the IBI/Pustell Sequence Analysis Program. A 
region of 660 bp of low homology (16%) was localized 

35 between MMTV env gene sequences 976 and 1640 (Majors, 

1. E. and Varmus, H. E. , 1983, J Virol 47:495-504). This 
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internal domain of the outer membrane of the env gene 
has only one glycosylation site and is highly conserved 
between strains* Two primers comprising 15 bp 
sequences at positions 976-990 (primer 1) and 1626-1640 
5 (primer 3) were first synthesized. Later longer 
primers were synthesized (IN and 3N) . An 18-mer 
sequence in the middle of the 660 bp MMTV env region 
(1388-1405) (primer 2) was used as a probe to identify 
the 660 bp sequence. A second oligomer probe was 

10 synthesized comprising the sequence 1554 to 1568 
(primer 2a) to be used for hybridization when a 
sequence of around 250 bp (between positions 1388 and 
1640) was amplified. For nested PCR reactions (Mullis, 
K.B. and Faloona, F.A. , 1987, Meth Enzymol 155:335- 

15 350) , another primer comprising sequences 1647 to 1661 
(primer 4) was synthesized to be used with primer 1 in 
the first reaction and primers 2 and 3 in the second. 
Modified primers with GC clamps and extra sequences 
were also synthesized and used in the PCR (primers la 

20 and 3a) . Another set of primers comprising sequences 
974 to 1003 (5L) and 1558 to 1577 (3L) were subse- 
quently developed because their Tm's matched and 
provided better amplification than the original 
primers. The sequences are represented in Table 1. 

25 All of them were productive in amplification reactions. 
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Table 1. 



Primer and prob 



10 



15 



in m «o - 3 911611(168 a *<* location 

in m us mammary tumor virus 



onv gene 



Sequence (5 '-3') 



1 

la 
IN 

2 

2N 
2a 

3 

3a 
3N 
4 

5L 
3L 



CCTCACTGCCAGATC 

GGGAATTCCTCACTGCCAGATC 

CCTCACTGCCAGATCGCCT 

TACATCTGCCTGTGTTAC 

CCTACATCTGCCTGTGTTAC 

CCGCCATACGTGCTG 

ATCTGTGGCATACCT 

GGGAATTCATCTGTGGCATACCT 
ATCTGTGGCATACCTAAAGG 
GAATCGCTTGGCTCG 
CCAGATCGCCTTTAAGAAGG 
T AC AGGT AG C AG C A CG T ATG 



Location 

976-990 
976-990 
976-993 
1388-1405 
1386-1405 
1554-1568 
1640-1626 
1640-1626 
1640-1621 
1661-1647 
984-1003 
1558-1577 



20 



25 



30 



35 



6.2.2. Detection of MMTV-Like Env Gene 

Sequences in Hnm.n Breast- T„^ r Pr ^ 
PCR was performed on DNA extracted from breast 
cancer tissues, normal breast tissues and from the 
Plasmid containing the env gene of MMTV, using primers 
1 and 3. Photographs of the ethidium bromide stained 
gels of the PCR product reveal the presence of an 
approximately 660 bp sequence in some of the tumors 
(Fig. 1A, lanes 1 and 3, but not in the normal tissue 
samples (Fig. i A , lanes 2 and 4). As a positive con- 
trol the MMTV env gene was also amplified (Fig i A 
lane E) . Similar results were obtained with modified 

of the gel with 32 P-labeled 18-mer oligonucleotide 
(Primer 2) indicated that this internal sequence was 
present in the amplified material (Fig. i B ) and that 
the bands in the gel were not artifactual 

Our initial effort was to analyze a representative 
sample of breast cancer specimens as well as normal 
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tissues and other tumors. To date 34 3 breast tumors 
have been processed, DNA extracted and PGR preformed. 
Of these 343 tumors, 314 were carcinomas and 29 were 
fibroadenomas. Amplification of sequences of 660 bp 
5 was observed in 121 of the carcinomas (38.5%) and in 
2 of the 29 fibroadenomas (6.9%). These sequences 
were confirmed to be MMTV env gene-like sequences by 
hybridization with the labeled specific probe 
containing the internal sequences. These sequences 

10 were not detected in the DNAs extracted from 2 0 normal 
organs, 2 3 cancers from other organs and 2 6 samples of 
blood lymphocytes including 7 from breast cancer 
patients whose breast specimens were positive. From 
107 samples of normal breast obtained from reduction 

15 mammoplasties , 2 were positive (1.8%). In addition to 
DNA from lymphocytes from seven positive patients, DNA 
from their normal breast tissue of the operated breast 
was tested in 4 cases. All were negative (Table 2) . 
Finally, DNA of the MCF-7, and ED (a cell line 

20 developed in our laboratory from the pleural effusion 

of a patient with an env -positive breast tumor) breast 
cancer cell lines were shown to contain the 660 bp MMTV 
env gene-like sequences (Table 3) , while four other 
breast cancer cell lines were positive only for the 

25 250 bp sequence (T47-D, BT-474, BT-20 and MDA-MB-231). 
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Table 2 . Det ction of MMTV env gene-lik 

s quences in human DNA xtracted 
from fresh or frozen tissues 



Sample 



MMTV env gene 
Number sequences % Positive 



Breast Carcinomas 314 

Breast Fibroadenomas 29 

10 Normal Breasts 107 

♦Normal Breasts 4 

Tumors other than 

breast 23 

Normal tissues 20 

15 Lymphocytes 2 6 

**Lymphocytes 7 



121 
2 

2 

negative 

negative 
negative 
negative 
negative 



38.5% 
6.9% 
1.8% 



20 



Histologically normal tissue from same breast 
as positive cancer. 



** 



Lymphocytes from breast cancer patients who were 
positive for MMTV env gene sequences in the tumor. 
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Tabl 3. D t ction of MMTV env g ne-lik sequences 
in DNA from human cell lines in cultur 



Human Cell Lines 



MMTV env gene sequence 



5 MC-7 
T47-D 
BT-2 0 

MDA-MB-2 31 
ZR-75-1 
10 SK-BR 3 

BT4 7 4 
ED 

MCF-10 
HB-447 
15 HL-60 
K562 
Jurkat 
Hep 6-2 



(breast carcinoma) 
it 



it 
it 



(normal breast) 
it it 

(promyelocytic leukemia) 
(erythroleukemia) 
(T cell leukemia) 
(hepatoma) 



positive 
negative 
negative 
negative 
negative 
negative 
negative 
positive 
negative 
negative 
negative 
negative 
negative 
negative 



The nested polymerase reaction was used in several 

20 instances to increase sensitivity and specificity, thus 
reducing the probability of false positives* In 
Fig. 2, results of a representative nested reaction 
are shown using primers 1 and 4 in the first reaction 
(Fig. 2A) and 2 and 3 for the 2nd reaction. The 

25 specificity of the reaction can be seen in the 2nd 
amplification (Fig. 2B) . 

To study a large number of samples and to be able 
to perform archival studies, PCR of paraffin-embedded 
tissue sections was also carried out. Primers 2 and 3 

30 were used to amplify a 250 bp sequence within the 

660 bp stretch when DNA was extracted from paraffin- 
embedded tissue sections since larger size sequences 
are difficult to amplify after fixation. Tumor DNA was 
amplified (Fig. 3A, lanes 2-5) whereas normal breast 

35 DNA was not (Fig. 3A, lane 1) . The identification of 
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this 2 50 bp sequence with the MMTV-like env gene was 
confirmed by hybridization with an internal probe 
(primer 2a) as shown in Fig. 3B. Using this procedure 
we have analyzed 151 breast cancer samples and found 
5 that 60 (39.7%) possess the 250 bp sequence. Of the 
27 normal breast samples obtained from reduction 
mammoplasties assayed by this procedure, one was 
positive (3.7%). These results, in conjunction with 
those obtained from lymphocytes and from normal breast 
10 tissue of patients whose breast cancer was PCR 

positive, indicate that MMTV-like sequences are present 
in a significant number of human breast cancer DNA 
which cannot be explained by DNA polymorphism. 

6.2.3. Cloning and Sequencing of the 
15 MMTV-Like Env Gene Sequences 

To find out whether there was homology to MMTV env 

gene throughout the whole 660 bp stretch, the product 

of the PCR from 8 different tumors was cloned and 

sequenced. In Fig. 4 the sequence of different clones 

20 comprising around 600 bp are represented, as aligned to 
the MMTV env gene sequence of the GR and BR6 strains 
(Redmon, S. and Dickson, c, 1983, EMBO J. 2:125-131). 
This domain of the env gene in the GR strain is 100% 
homologous to the C 3 H strain and 98% to the BR6 strain 

25 (Majors, I.E. and Varmus, H.E., 1983, J. Virol. 47:495- 
504; Moore, R. et al., 1987, J. Virol. 61:480-490). 
Evaluation of the clones indicated that homology to 
MMTV env gene varied from 95% to 99%. Another seven 
clones comprising only 250 bp were also sequenced. 

30 Homology to MMTV env gene varied from 95% to 99% (data 
not shown) . When compared to the human endogenous 
provirus HERV-K10, the homology of all the clones was 
less than 15%. When compared against all known viral 
and human genes (more than 13 0,000 entries) using the 

3 5 1B1 MacVector GenBank and GCG programs, the highest 
homology recorded was 18%. 
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6.2.4. Southern Blot Analysis 
Using Cloned Sequences 

To investigate whether the env gene- like sequences 

were present in human DNA, Southern blot hybridization 

5 was performed using the cloned sequence as probe. DNAs 

from normal breast tissues, env positive or negative 

breast tumors, tumors other than breast and breast 

cancer cell lines were restricted with EcoRI and in 

some instances with Pstl, Bglll or Kpnl. EcoRI is a 

10 frequent cutter restriction enzyme that digests MMTV 

proviral DNA between env and pol genes. Four different 
cloned 660 bp sequences were used as probes after 
labeling with 32 P by random prime-labeling. Results of 
some of the Southern blot hybridization experiments are 

15 shown in Fig. 5. They reveal the presence of a labeled 
restriction fragment migrating at approximately 7-8 kb 
in breast cancer DNA, in ED and two fragments in MCF-7 
cells. Different restriction patterns were observed 
with the other three enzymes. The 660 bp sequence 

20 was absent in 10 normal tissues, 10 fibroadenomas and 
10 tumors from other tissues. It is important to 
emphasize that hybridization conditions for these 
experiments were stringent (as described in Section 
6.1) to avoid interference with endogenous sequences 

2 5 that might interact with the probes. 

7. Example: Detection of a Retrovirus 
Proviral Fragment in Human 
Breast Cancer Cells and Tissues 

7.1. Materials and Methods 

3 0 To detect longer retrovirus proviral fragments in 

breast cancer samples, DNA was extracted from breast 
cancer carcinoma tissue samples as described above in 
Section 6.1. Two rounds of long PCR was performed on 
the DNA primers 5L (SEQ ID NO: 11) and LTR3 (SEQ ID 
35 NO: 19). The primer 5L contains nucleotides 7370-7395 
of the MMTV BR6 genome (5' -3': CCAGATCGCCTTTAAGAAGG) 
(SEQ ID NO: 11) and primer LTR3 contains nucleotides 
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9918-9927 of the MMTV BR6 genome (5'-3': 
CGAACAGACACAAAGCGACG) (SEQ ID NO: 19). Long PCR was 
performed using protocols described by the manufacturer 
(Perkiri Elmer, Foster City, CA) . The amplified 
5 retroviral fragment isolated from the breast cancer 
sample was cloned into the TA cloning vector 
(Invitrogen) and automated sequencing was performed 
as described in Section 6.1. 

7 . 2 Results 

10 An approximately 2.6 kb retroviral fragment 

containing 1,228 bp of the 3' LTR sequence and 1,3 36 bp 
of the env gene sequence of a potential provirus was 
detected in a human breast carcinoma tissue sample by 
the long PCR technique using the 5L and LTR 3 primers. 

15 The sequence of this retroviral fragment is shown in 
Fig. 9. (SEQ ID NO:20). 

When compared with the two strains of MMTV C3H 
and BR6, the sequence homology was 90.8% and 90.7%, 
respectively, over the MMTV genomic fragment from 

20 nucleotides 7370-9937. When compared with the 

endogenous retroviral sequences (HUMERKA) , sequence 
homology was only 58% in 36 bp and 71% in 74 bp. 

8 . Discussion 
Search for virus-related sequences in human breast 

25 cancer has been hampered by great variation reported 
in previous studies, by the presence of endogenous 
retroviral sequences in human DNA and by the lack of 
sensitivity of the methods employed. The studies 
reported herein circumvent these deficiencies by 

3 0 focusing on sequences with low homology to human 
endogenous retroviruses, by investigating a large 
number of tumors and several types of controls and 
by using the most sensitive technology presently 
available. 

3 5 The results indicate that unique MMTV env gene 

sequences were present in 38.5% of the breast cancer 
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samples analyzed and 39.7% of archival samples of 
breast cancer and that these sequences were absent in 
normal tissues including lymphocytes from patients with 
positive breast cancer and in cancers other than 
5 breast. Normal breast tissue and fibroadenomas had a 
low frequency (1.8 to 6.9%) of positive results. When 
cloned and sequenced, the sequences were found to be 
highly homologous to MMTV env gene, but not to the 
endogenous retroviral sequences. Furthermore, 

10 experiments in which the cloned amplified sequences 

were used for hybridization with DNA from breast cancer 
or normal tissues revealed that homologous DNA was only 
present in breast cancer DNA. The results also 
indicate that a human breast carcinoma sample contained 

15 an about 2.6 kb MMTV-like fragment comprised of 1,3 36 
bp of the env gene and 1,228 bp of the 3' LTR. 

The detection of MMTV env gene sequences in two 
fibroadenomas out of 29 and in two normal breast tissue 
samples out of 107 samples is of uncertain signifi- 

20 cance. Although such results could potentially be 

artifactual, and thus may represent false positives, 
they may alternatively indicate the presence of 
histologically unrecognized cells that were or will be 
neoplastic. 

25 Ninety percent (90%) of the breast cancers tested 

were invasive ductal carcinomas, which reflects the 
prevalence of this type of neoplasm. Most patients 
were node-positive which is probably artifactual since 
it was necessary that tumor size be sufficiently large 

30 to provide an aliquot for research and tumor size 
correlates with node posit ivity. 

It is unlikely that differences in homology 
between MMTV env gene and the cloned human sequences 
are generated by errors committed by the Taq 

3 5 polymerase. It has been estimated that the rate of 
nucleotide misincorporation is 1 x 10" 5 per cycle 
(Ehrlich et al, 1991, Science 252:164 3-1651) and 
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therefore, only a total of 0.32 nucleotides 
misincorporated should be expected in 660 bp after 
50 cycles. The differences in homology between clones 
from different patients is likely to represent 
5 heterogeneity of the env gene. 

In contrast to earlier, ambiguous data associating 
MMTV-like sequences with human breast cancer, we have 
clearly demonstrated the existence of such sequences in 
breast cancer cells which cannot be explained by any 

10 known human endogenous retroviral sequence. Our data do 
not support the results of earlier studies which 
indicated that, as in the mouse, MMTV-like sequences 
were found in lymphocytes from two patients with breast 
cancer (Crepin, M. et al., 1984, Biochem. Biophys. Res. 

15 Comm. 118:324-331), The absence of MMTV env-like 

sequences in lymphocytes could reflect the fate of a 
unique lymphocyte subset over decades between initial 
encounter and the appearance of clinical breast cancer; 
alternatively, the human disease may differ from the 

20 mouse model. Results from attempts to identify unique 
MMTV-like pol gene sequences have shown that they 
cannot be distinguished from the reverse transcriptase 
sequences of endogenous retroviruses (Deen, K.C. and 
Sweet, R.W., 1986, J. Virol. 57:422-432). 

25 The origin of the MMTV env gene-like and 3' LTR- 

like sequences found in tumor DNA could be the result 
of integrated MMTV-like sequences from a human mammary 
tumor virus. Polymorphism of endogenous retroviral 
sequences is conceivable but can be ruled out because 

3 0 these sequences were not detected in lymphocytes from 
the positive patients, in sections of the cancerous 
breast from which abnormal cells were absent, or in 
normal breast tissue from patients with MMTV env -like 
positive tumors. Recombination during tumorigenesis 

3 5 between endogenous sequences to resemble the MMTV env 
genes seems highly unlikely since no known gene or 
viral sequence is more than 18% homologous to the 
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660 bp sequence. The longer about 2.6 kb MMTV-like 
fragment detected in a human breast carcinoma had 
minimal homology (58% in 3 6 bp nd 71% in 74 bp) to 
endogenous human retroviral sequences. Thus, the most 
5 conservative interpretation is that our findings repre- 
sent exogenous sequences from an agent similar to MMTV. 
Recombination between endogenous and exogenous env gene 
sequences are known to accelerate the development of 
malignancies in mice (DiFronzo, N.L. and Holland, C.A. , 

10 1993, J. Virol. 67:3763-3770). Whether the MMTV-like 
sequences belong to an entire acquired provirus or to 
an exogenous fragment integrated into endogenous 
sequences, is presently not known. Experiments are in 
progress to distinguish between these possibilities. 

15 Several genetic alterations have been identified 

in human breast cancer that can be useful as markers 
for prevention, detection or prognosis (reviewed in 
Runnenbaum, I. et al., 1991, Proc. Natl. Acad. Sci. USA 
88:10657-10661). The BRCA1 and BRCA2 genes have 

2 0 recently been described. They account for at least 

5% of breast cancer and are related to familial breast 
cancer (Miki, Y. et al., 1994, Science 266:66-71; 
Wooster, R . et al . , 1994, Science 265:2088-2090). We 
have primary evidence that familial clustering of the 

25 MMTV env gene-like sequences occurs, accounting for an 
even higher percentage of cancers in affected families 
(Holland et al. 1994, Proc. Am. Assoc. Cancer Res 
35:218). The presence of MMTV-like sequences may be 
correlated with special clinical disease status, may 

30 provide another potential molecular marker, and may 

distinguish a subset of human breast cancer for which 
viral etiology is tenable. This has implications for 
epidemiology, therapy and prevention. 

Various publications are cited herein, the 

35 contents of which are hereby incorporated by reference 
in their entireties. 
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(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear ' 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 : 
CCTCACTGCC AGATC 



15 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 
10 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GGGAATTCCT CACTGCCAGA TC 



15 (2) INFORMATION FOR SEQ ID NO : 3 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
2 0 (D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

2 5 (vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
CCTCACTGCC AGATCGCCT 19 

3 0 (2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

( B ) TYPE : nucleic acid 

(C) STRANDEDNESS: single 
3 5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

( iii ) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 
40 (vi) ORIGINAL SOURCE: 

( ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TACATCTGCC TGTGTTAC 18 



(2) INFORMATION FOR SEQ ID NO: 5: 

4 5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii> MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
fix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCTACATCTG CCTGTGTTAC 

20 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 15 base pairs " 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinqle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CCGCCATACG TGCTG 

15 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS * 

(A) LENGTH: 15 base pairs * 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinale 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
Uii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
(ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 
ATCTGTGGCA TACCT 

15 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS* 

(A) LENGTH : 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
(ix) FEATURE: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGGAATTCAT CTGTGGCATA CCT 23 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
10 (iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

ATCTGTGGCA TACCTAAAGG 20 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 

( iii ) HYPOTHETICAL: NO 

2 5 (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

3 0 GAATCGCTTG GCTCG 15 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

3 5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
40 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCAGATCGCC TTTAAGAAGG 20 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

3 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
10 <v) FRAGMENT TYPE: 

<vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TACAGGTAGC AGCACGTATG 

15 (2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

ori (C) STRANDEDNESS: single 

zu (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

„ < v > FRAGMENT TYPE: N-terminal 

^° (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Leu Lys Arg Pro Gly Phe Gin Glu His Glu Met He 



20 



10 



(2) INFORMATION FOR SEQ ID NO: 14: 

30 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

35 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE : N-terminal 

(vi) ORIGINAL SOURCE: 

40 < x i) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Gly Leu Pro His Leu He Asp He Glu Lys Arg Gly 
15 10 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 
<iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 
5 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Thr Asn Cys Leu Asp Ser Ser Ala Tyr Asp Thr Ala 
15 10 



(2) INFORMATION FOR SEQ ID NO: 16: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
< D ) TOPOLOGY : 1 inear 

15 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

2 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Asp lie Gly Asp Glu Pro Trp Phe Asp Asp 
15 10 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

2 5 (A) LENGTH: 662 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

3 0 (iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

3 5 TCCTCACTGC CAGATCGCCT TTAAGAAGGA CGCCTTCTGG GAGGGAGACG AGTCTGCTCC 60 

TCCACGGTGG TTGCCTTGCG CCTTCCCTGA CCAAGGGGTG AGTTTTTCTC CAAAAGGGGC 120 

CCTTGGGTTA CTTTGGGATT TCTCCCTTCC CTCGCCTAGT GTAGATCAGT CAGATCAGAT 180 

TAAAAGCAAA AAGGATCTAT TTGGAAATTA TACTCCCCCA GTCAATAAAG AGGTTCATCG 240 

ATGGTATGAA GCAGGATGGG TAGAACCTAC ATGGTTCTGG GAAAATTCTC CTAAGGATCC 300 

4 0 CAATGATAGA GATTTTACTG CTCTAGTTCC CATACAGAAT TGTTTCGCTT AGTTGCAGCC 360 

TCAAGATATC TTATTCTCAA AAGGCAGGAT TTCAGGAACA TGAGATGATT CCTACATCTC 420 

TGTGTTACTT ACCCTTATGT CATATTATTA GGATTACCTC AGCTAATAGA TATAGAGAAA 480 

GAGGATCTAC TTTTCATATT TCCTGTTCTT CTTGTAGATT GACTAATTGT TTAGATTCTT 540 

CTGCCTACGA CTATGCAGCG ATCATAGTCA AGAGGCCGCC ATACGTGCTG CTACCTGTAG 600 

4 5 ATATTGGTGA TGAACCATGG TTTGATGATT CTGCCATTCA AACCTTTAGG TATGCCACAG 660 

AT 662 

(2) INFORMATION FOR SEQ ID NO: 18: 

( i ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 663 base pairs 
50 (B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TCCTCACTGN CAGATCGCCT TTAAGAAGGA CGCCTTCTGG GAGGGAGACG AG TCTGCTCC 60 

TCCACGGTGG TTGACTTGCG CCTTCCCTGA CCAGGGGGTG AGTTTTTCTC CAAAAGGGGC 120 

CCTTGGGTTA CTTTGGGATT TCTCCCTTCC CTCGCCTAGT GTAGATCAGT CAGATCAGAT 180 

TAAAAGCAAA AAGGATCTAT TTGGAAATTA TACTCCCCCT GTCAATAAAG AGGTTCATCG 240 

ATGGTATGAA GCAGGATGGG TAGAACCTAC ATGGTTCTGG GAAAATTCTC CTAAGGATCC 300 

CAATGATAGA GATTTTACTG CTCTAGTTCC CATACAGAAT TGTTTCGCTT AGTTGCAGCC 360 

TCAAGATATC TTATTCACAA AAGGCAGGAT TTCAAGAACA TGACATGAAT CCCTACATCT 420 

CTGTGTTACT TACCCTTATG CC AN ANT ATT AGGATTACCT CAGCTAATAG ATATAGAGGA 480 

AGAGGATCTA CTTTTCATAT TTCCTGTTCT TCTTGTAGAT TGACTAATTG TTTAGATTCT 540 

TCTGCCTACG ACTATGCAGC GATCATAGTC AAGAGGCCGC CATACGTGCT GCTACCTGTA 600 

GATATTGGTG ATGAACCATG GTTTGATGAN NCTGCCANTC AAACCTTTAG GTATNCCACA 660 

GAT 663 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CGAACAGACA CAAACACACG 20 



(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2598 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 



WO 97/17470 PCT/US96/17877 

35 



(xi) SEQUENCE DESCRIPTION: 

CGAACAGACA CAAACACACG AGAGGTGAAT 

CAGCACTCTT TTATATCATG GTTTACATAA 

AAGAACATAG GAGAATAGAA CACTCAGAGC 

5 TCAGGAAACC ACTTGTCTCA CATCCTTGTT 

TAAACCTTGG GAACCGCAAN GTTGGGCTCA 

ATTATCTGCA GAAATGTGTT CCTAATTGTC 

AAATCTTTCC CCCAACGTTC ATCCCACTCC 

TTTATGTCGT CTTTTTCTTC CTGAGTTAAC 

10 CTTTCACGAA AGGGGAGGGA TCTGTACAAC 

TCGAAATTTA AATCGTATCT TCCTGTATAT 

TAAGTCCCGG TTGCCACCAC CTGTCTCCTA 

TTACTTCTAG GCCTGAGGCC CTTAGTCCTT 

TCTTTCTATT TTCTATTCCC ATTTCTAACC 

15 ACAAAGATTC ATTTCTTAAC ATCATGATTA 

TAAAAATATA ATTTTTAGCA AGCATTCTTA 

GCTTGTAANA GGAANTTGGC TGTGGTCCTT 

TGTTTAGATT GTAATCTTGC ACAGAAGAGT 

GAGCACGAAC CGCAACTTCC CCCAATAGCC 

2 0 AGCAGANAAT GAGTATGTCT TTGTCTGATG 

CCTTGGTGGG AAACAACCCC TTGGCTG CTT 

TCAACCATTT CTGCTGCAGG CGCGGCATTT 

TTAAGATCTG ACTGCACTTG GTCAAGGCTC 

ATCATAAGTA CTATGACCAA AAGCAGGGCT 

2 5 CTAATCCAAT GGATTTAAAG CCTTTACTCC 

TCCCTGCGTT CAAATTTTTT TGCTCNTATC 

GAAACTCATG TCTTCAAATG CCCAATAAAT 

CATTATACGG NANAGGTGTG ACACAGCATA 

ATTCTGGTCT TTAAGTTTGC CACATCTTGT 

3 0 TTAAGTCTAG CTTTCAATTT TAAGTCTATT 

TTTCTATGAA GATTATTAAC AAACGTAGCA 

GCTACAGCAA AGGAAGTGAT AATAGCAATT 

ACGAATCGCT TAGCTCGAAT TAAATCTGTG 

TCAAACCATG GTTCATCACC AATATCTACA 

3 5 ATGAATCGCT GCATATCCGT NGGCAAAAAA 

GGATTTGAAA NTTATNCCCC TTNCCCCNAA 

GTATCTANAA NAGGGCATAG GGGTAAGAAA 

AAATTCNGGG TTTGGGAGAA TAAGATTCTG 

TGGGGAATAG AGCAGTAAAA TCTCTATCAT 

4 0 CCAAGTAGGT TCNAACCCAT CNTGCTTCAT 

GAGTATAATT TCCAAATAGA TCCTTTTTGT 

GCGGGGGAAG GGAGAAATCC CAAAGTAACC 

CCTGGTCAGG GAAGGCGCAA GGCAACCACC 

AGGCGTCCTT CTTAAAGGCG ATCTGGAGGA 

4 5 TTCTTAAAGG CGATCTGG 



SEQ ID NO: 20: 

GTTAGGACTG TTGCAAGTTT ACTCAAAAAA 60 

GCATTTACAT AAGACTTGGA TAAGTTCCAA 120 

TTAGATCAAA ACATTTGATA CCAAACCAAG 180 

TTAAGAACAG TTTGTGACCC TGAACTTACT 240 

TAAAGGTTAT CCATTATAGC TCATGCCAAA 300 

TAGCCACTGC CCCCTCCCTT GGTATAATGA 3 60 

CCTAGATAAA TATAATCATG TACCTGTTGT 420 

ACACACCAAG GAGGTCTAGC TCTGGCGAGT 480 

ACTTTATAGC CGTTGACTGT GACCCACCTA 540 

GGT AG CGGGG CGTCTGTTGG TCTGTAGATG 600 

TTTTGACAAG CGTACTCCTC TTTCCCCTTT 660 

GCACCTGTTC TTCAACTGAG GTTGAGCGTC 720 

TTTGAATTTG AGTAAATATA GTGCTAAAAG 780 

ATAATCGACC TATTGGATTG GTCTTATTGG 840 

TTTCTATTTC TGAAGGACAA AGTCGGTGTG 900 

GCCCCACGAG GAAGGTCGAG TTCTCCGAAT 960 

TATTAAAAGA ATCAAGGGTG AGAGCCCTGC 1020 

CCAGGCAAAG CAGAGCTATG CCAAGTTTGC 1080 

GGCTCATCCG CGTGCACGCA GACGGGTCGT 1140 

CTCTCCTAAG TGTAGGACAC TCTCGGGAGT 1200 

CCCCCTTTTT TCTTTTTTAA AAGAAGCACG 12 60 

TTCGCAAAGC ACTGGAAAAT AACGGGGAAA 1320 

CCAACTCCTA TAAAAATGAA ATATTGTGTT 1380 

ATTGGCNAAG GANTGANCCA ACCCCTGAGG 1440 

CTAATCCAAT TGGTAACCCC GTTTNTTTTT 1500 

GAGCCCTGGT TCTTTCCCAG CTCTCAGAAG 1560 

AAATCATAAT TTGCATGACA CCTAGTGGAC 1620 

CCCAACTCTA AAACTACTTC TTCTAAAGCA 1680 

ATTCTTTGTT CAGATNAGGC TAATGTAACA 1740 

GTTTGCATCT CCTTAACTAA GGCAGTAGTA 1800 

AAAGCAGATA TGCCCAGAAT AATGGCAGCG 1860 

GCATACCTAA AGGTTTGAAT GGCAGAATCA 1920 

GGTTACAACA CATATGGCGG CCCCTTGAAT 1980 

TCTAACCATT ATTCCTCCTN CCNAAAAACG 2040 

CCCANACCGA GGTACCCCAT AATGNGGGGG 2100 

AACGGCAGAG NGGGATCNTT TATGTTCNGG 2160 

GAGGCTGCAA ATTAAGGGAA ACATTNTGTA 2 220 

GGGGATCTTT AGGGAGAATT TTCCCAGGAA 2 280 

ACCATCGATG AACNTCTTTA TTG AC AGGGG 2 340 

TTTTAATCTG ATCTGACTGA TCTACACTAG 2 400 

CAAGGGCCCC TTTTGGAGAA AAACTCACCC 2460 

GTGGAGGAGC AGACTCGTCT CCCTCCCAGA 2 520 

GCAGACTCGT CTCCCTCCCA GAAGGCGTCC 2 580 

2598 
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Claims 



1 l. A composition comprising an oligonucleotide primer 

2 which may be used to detect the presence of a 

3 nucleic acid molecule which (i) hybridizes to the 

4 env gene of a mouse mammary tumor virus; (ii) is 

5 present in at least 38 percent of DNA samples 

6 prepared from breast cancer tissue of different 

7 human subjects; and (iii) hybridizes to less than 

8 7 percent of DNA samples prepared from tissues 

9 other than breast cancer tissue from different 
10 human subjects. 

1 2. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTCACTGCCAGATC (SEQ ID NO:l). 

1 3. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 GGGAATTCCTCACTGCCAGATC (SEQ ID NO : 2 ) . 

1 4. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTCACTGCCAGATCGCCT (SEQ ID NO: 3). 

1 5, The composition of claim l, wherein the 

2 oligonucleotide primer comprises the sequence 

3 TACATCTGCCTGTGTTAC (SEQ ID NO: 4). 

1 6, The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTACATCTGCCTGTGTTAC (SEQ ID NO: 5). 

1 7. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCGCCATACGTGCTG (SEQ ID NO: 6), 
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1 8. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 ATCTGTGGCATACCT (SEQ ID NO : 7 ) . 

1 9. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 GGGAATTC ATCTGTGGCATACCT (SEQ ID NO : 8 ) . 

1 10. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises a sequence 

3 selected from the group consisting of 

4 ATCTGTGGCATACCTAAAGG (SEQ ID NO : 9 ) ; 

5 GAATCGCTTGGCTCG ( SEQ ID NO : 10 ) ; 

6 CCAGATCGCCTTTAAGAAGG (SEQ ID NO: 11); and 

7 TACAGGTAGCAGCACGTATG (SEQ ID NO: 12). 

1 11, An essentially purified peptide encoded by a 

2 nucleic acid molecule which (i) hybridizes to 

3 a gene of MMTV; (ii) is present in at least 

4 20 percent of DNA samples prepared from breast 

5 cancer tissue of different human subjects; and 

6 (iii) is present in less than 5 percent of DNA 

7 samples prepared from tissues other than breast 

8 cancer tissue from different human subjects. 

1 12. An antibody which specifically binds to the 

2 peptide of claim 11. 

1 13. The peptide according to claim 11 which comprises 

2 the amino acid sequence LKRPGFQEHEMI (SEQ ID 

3 NO: 13) . 

1 14. An antibody which specifically binds to the 

2 peptide of claim 13. 

1 15. The peptide according to claim 11 which comprises 

2 the amino acid sequence GLPHLI D I EKRG (SEQ ID NO: 14). 
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10 



1 16. A method of diagnosing breast cancer in a human 

2 subject, comprising detecting the presence of a 
peptide encoded by a nucleic acid molecule which 
(i) hybridizes to the env gene of 3' LTR of a 
mouse mammary tumor virus; (ii) is present in at 
least 2 0 percent of DNA samples prepared from 
breast cancer tissue of different human subjects; 
and (iii) is present in less than 5 percent of DNA 
samples prepared from tissues other than breast 
cancer tissue from different human subjects. 



l 17. 
2 



The method according to claim 16, wherein the 
presence of a peptide comprising the amino acid 

3 sequence LKRPGFQEHEMI (SEQ ID NO: 13) is detected 

4 by the binding of an antibody specific to the 

5 peptide. 



1 18 . 

2 
3 
4 



5 peptide 



1 
2 
3 
4 
5 



1 20. 

2 

3 

4 



1 
2 



The method according to claim 16, wherein the 
presence of a peptide comprising the amino acid 
sequence G LPHLI D I EKRG (SEQ ID NO: 14) is detected 
by the binding of an antibody specific to the 



19. The method according to claim 16, wherein the 

presence of a peptide comprising the amino acid 
sequence TNCLDSSAYDTA (SEQ ID NO: 15) is detected 
by the binding of an antibody specific to the 
peptide. 



5 peptide. 



The method according to claim 16, wherein the 
presence of a peptide comprising the amino acid 
sequence DIGDEPWFDD (SEQ ID NO: 16) is detected by 
the binding of an antibody specific to the 



21. A composition comprising an oligonucleotide primer 
which may be used to detect the presence of a 
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3 nucleic acid molecule which (i) hybridizes to a 

4 nucleic acid comprised of a sequence selected from 

5 the group consisting of the env gene and the 3' 

6 LTR of a mouse mammary tumor virus; (ii) is 

7 present in a substantial percentage of DNA samples 

8 prepared from breast cancer tissue of different 

9 human subjects; and (iii) hybridizes to less than 

10 5 percent of DNA samples prepared from tissues 

11 other than breast cancer tissue from different 

12 human subjects. 

1 22. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCAGATCGCCTTTAAGAAGG (SEQ ID NO: 11). 

1 23. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CGAACAGACACAAACACACG (SEQ ID NO: 19). 
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CGAACAGACAC/^ACACACGAGAGGTfeAATGTTAGGAC . -TTGCAAGTTTA 

CTCAAAAAACAGCACTCTTTTATATCATGGTTTACATAAGCATTTACATAAGA 

CTTGGATAAGTTCCAAAAGAACATAGGAGAATAGAACACTCAGAGCTTAGAT 

CAAAACATTTGATACCAAACCAAGTCAGGAAACCACTTGTCTCACATCCTTG 

TTTTAAGAACAGTTTGTGACCCTGAACTTACTTAAACCTTGGGAACCGCAAN 

GTTGGGCTCATAAAGGTTATCCATTATAGCTCATGCCAAAATTATCTGCAGA 

AATGTGTTCCTAATTGTCTAGCCACTGCCCCCTCCCTTGGTATAATGAAAAT 

CTTTCCCCCAACGTTCATCCCACTCCCCTAGATAAATATAATCATGTACCTGT 

TGTTTTATGTCGTCTTTTTCTTCCTGAGTTAACACACACCAAGGAGGTCTAGC 

TCTGGCGAGTCTTTCACGAAAGGGGAGGGATCTGTACAACACTTTATAGCC 

GTTGACTGTGACCCACCTATCGAAATTTAAATCGTATCTTCCTGTATATGGTA 

GCGGGGCGTCTGTTGGTCTGTAGATGTAAGTCCCGGTTGCCACCACCTGTC 

TCCTATTTTGACAAGCGTACTCCTCTTTCCCCTTTTTACTTCTAGGCCTGAGG 

CCCTTAGTCCTTGCACCTGTTCTTCAACTGAGGTTGAGCGTCTCTTTCTATTT 

TCTATTCCCATTTCTAACCTTTGAATTTGAGTAAATATAGTGCTAAAAGACAA 

AGATTCATTTCTTAACATCATGATTAATAATCGACCTATTGGATTGGTCTTATT 

GGTAAAAATATAATTTTTAGCAAGCATTCTTATTTCTATTTCTGAAGGACAAA 

GTCGGTGTGGCTTGTAANAGGAANTTGGCTGTGGTCCTTGCCCCACGAGGA 

AGGTCGAGTTCTCCGAATTGTTTAGATTGTAATCTTGCACAGAAGAGTTATTA 

AAAGAATCAAGGGTGAGAGCCCTGCGAGCACGAACCGCAACTTCCCCCAAT 

AGCCCCAGGCAAAGCAGAGCTATGCCAAGTTTGCAGCAGANAATGAGTATG 

TCTTTGTCTGATGGGCTCATCCGCGTGCACGCAGACGGGTCGTCCTTGGTG 

GGAAACAACCCCTTGGCTGCTTCTCTCCTAAGTGTAGGACACTCTCGGGAG 

TTCAACCATTTCTGCTGCAGGCGCGGCATTTCCCCCTTTTTTCTTTTTTAAAA 

GAAGCACGTTAAGATCTGACTGCACTTGGTCAAGGCTCTTCGCAAAGCACT 

GGAAAATAACGGGGAAAATCATAAGTACTATGACCAAAAGCAGGGCTCCAA 

CTCCTATAAAAATGAAATATTGTGTTCTAATCCAATGGATTTAAAGCCTTTAC 

TCCATTGGCNAAGGANTGANCCAACCCCTGAGGTCCCTGCGTTCAAATTTTT 

TTGCTCNTATCCTAATCCAATTGGTAACCCCGTTTNTTTTTGAAACTCATGTC 

TTCAAATGCCCAATAAATGAGCCCTGGTTCTTTCCCAGCTCTCAGAAGCATT 

ATACGGNANAGGTGTGACACAGCATAAAATCATAATTTGCATGACACCTAGT 

GGACATTCTGGTCTTTAAGTTTGCCACATCTTGTCCCAACTCTAAAACTACTT 

CTTCTAAAGCATTAAGTCTAGCTTTCAATTTTAAGTCTATTATTCTTTGTTCAG 

ATNAGGCTAATGTAACATTTCTATGAAGATTATTAACAAACGTAGCAGTTTGC 

ATCTCCTTAACTAAGGCAGTAGTAGCTACAGCAAAGGAAGTGATAATAGCAA 

TTAAAGCAGATATGCCCAGAATAATGGCAGCGACGAATCGCTTAGCTCGAAT 

TAAATCTGTGGCATACCTAAAGGTTTGAATGGCAGAATCATCAAACCATGGT 

TCATCACCAATATCTACAGGTTACAACACATATGGCGGCCCCTTGAATATGA 

ATCGCTGC ATATCCGTNGGCAAAAAATCTAACCATTATTCCTCCTNCCNAAA 

AACGGGATTTGAAANTTATNCCCCTTNCCCCNAACCCANACCGAGGTACCC 

CATAATGNGGGGGGTATCTANAANAGGGCATAGGGGTAAGAAAAACGGCA 

GAGNGGGATCNTTTATGTTCNGGAAATTCNGGGTTTGGGAGAATAAGATTCT 

GGAGGCTGCAAATTAAGGGAAACATTNTGTATGGGGAATAGAGCAGTAAAA 

TCTCTATCATGGGGATCTTTAGGGAGAATTTTCCCAGGAACCAAGTAGGTTC 

NAACCCATCNTGCTTCATACCATCGATGAACNTCTTTATTGACAGGGGGAGT 

ATAATTTCCAAATAGATCCTTTTTGTTTTTAATCTGATCTGACTGATCTACACT 

AGGCGGGGGAAGGGAGAAATCCCAAAGTAACCCAAGGGCCCCTTTTGGAG 

AAAAACTCACCCCCTGGTCAGGGAAGGCGCAAGGCAACCACCGTGGAGGA 

GCAGACTCGTCTCCCTCCCAGAAGGCGTCCTTCTTAAAGGCGATCTGGAGG 

AGCAGACTCGTCTCCCTCCCAGAAGGCGTCCTTCTTAAAGGCGATCTGG 



BNSDOCIO: <WO 971 7470A1 > 



i 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US96/ 17877 



A. CLASSIFICATION OF SUBJECT MATTER 
IPC(6) :P1ease Sec Extra Sheet. 

US CL :435/6, 5< 91. 2. 7.1. 7.2; 536 23.1. 24.3, 24.33. 530/388.1, 300 
According to International Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 435/6, 5, 91.2. 7.1, 7.2; 536 23.1, 24.3, 24.33; 530/388.1, 300 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
Please See Extra Sheet. 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



REDMOND et al. Sequence and expression of the mouse 
mammary tumour virus env gene. The EMBO Journal. 1983, 
Volume 2, Number 1 , pages 125-131. See entire document. 

FAFF et al., Retrovirus-like particles from the human T47D 
cell lines are related to mouse mammary tumour virus and are 
of human endogenous origin. Journal of General Virology. 21 
May 1992, Volume 73, pages 1087-1097. See abstract. 



1-20 



1-20 



| x| Further documents are listed in the continuation of Box C. | | Sec patent family annex. 



Special calcf ones of cited document*: 



"E" earlier document published on or after (he imeraationaJ filing date 

'L* document which may throw doubti on priority cbum<») or which m 

cited to establish the publication dale of another citation or other 
special iinsuu (a* specified) 

"O* document refemnj lo an oral disclosure, use. exhibition or other 

a P* document published prior to the mtc rn aii o osl fUinf date but later than 



later document pub limbed after the international filinf date or priority 
dale and not in conflict with the application but eked to understand the 
principle or theory undertyinf the invention 

document of particular relevance; the churned invention cannot be 
considered novel or cannot be considered to mvoNe an inventive step 
i the do 



"A" 



document of particular relevance; the churned invention cannot be 
considered to involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious to • person skilled in the art 

document member of the same palest family 



Date of the actual completion of the international search 
04 FEBRUARY 1997 


Date of mailing of the international search report 

18MAR1997 


Name and mailing address of the ISA/US 
Commissioner of Patents and Trademarks 
Box PCT 

Washington, D.C. 20231 
Facsimile No. (703) 305-3230 


Authorized officer 

DIANNE REES ^J 0-4 -* H^) 
Telephone No. (703) 308-0196 



Form PCT/1SA/210 (second sheet KJuly 1992)* 



BNSDOCID: <WO 9717470At> 



INTERNATIONAL SEARCH REPORT 



C (Continuation). DOCUMENTS CONSIDERED TO BE RELEVANT 



International application No. 
PCT/US96/ 17877 



Category' 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



CREPIN et al. Sequences Related to Mouse Mammary Tumor 1 1-20 
Virus Genome in Tumor Cells and Lymphocytes from Patients 
with Breast Cancer. Biochemical and Biophysical Research 
Communications. 13 January 1984, Volume 118, Number 1, pages 
324-33 1 . See entire document. 

MESA-TEJADA et al. Detection in human breast carcinomas of 1 1-20* 
an antigen immunologically related to a group-specific antigen of 
mouse mammary tumor virus. Proceedings of the National 
Academy of Sciences, USA. March 1978, Volume 75, Number 3 
pages 1529-1533. 



Form PCT/1SA/210 (continuation of second sheet)(July 1992)* 



8NSDOCI0:<WO 9717470A1> 



INTERNATIONAL SEARCH REPORT 



A. CLASSIFICATION OF SUBJECT MATTER: 
IPC (6): 

C12Q 1/68, 1/70; C12P 19/34; C07H 21/02, 21/04; G01N 33/53; C07K 15/28; 5/00 

B. FIELDS SEARCHED 

Electronic data bases consulted (Name of data base and where practicable terms used): 

APS, BIOSIS, BIOTECHABS, BIOTECH DS, CABA, CAPLUS, CANCERLIT, DGENE.DRUGU, EMBASES, 
MEDLINE. USPATFULL, TOXLIT, TOXLINE, JAPIO. WPIDS 

search terms: MMTV, mouse mammary tumor virus, PCR. hybridization, antibodies, immunoassays, Westerns, 
searched SEQ. ID- Nos. 




International application No. 
PCT/US96/17877 



Form PCT/ISA/210 (extra sheet )<July 1992)* 



BNS0OCID: <WO 971 7470A1 > 



CORRECTED 
VERSION* 



ORLD INTELLECTUAL PROPERTY ORGANIZATl' 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 

C12Q 1/68, 1/70, C12P 19/34, C07H 
21/02, 21/04, G01N 33/53, C07K 15/28, 
5/00 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 97/17470 

15 May 1997 (15.05.97) 



(21) International Application Number: PCT/US96/ 1 7877 

(22) International Filing Date: 8 November 1996 (08.1 1.96) 



(30) Priority Data: 

08/555,394 



9 November 1995 (09.11.95) 



US 



(71) (72) Applicant and Inventor: HOLLAND, James, F. [US/US]; 

31 Mamaroneck Road, Scarsdale. NY 10583 (US). 

(72) Inventor: POGO, Beatriz, G., T.; 237 Nyac Avenue, Pelham, 

NY 10803 (US). 

(74) Agents: CLARK, Richard, S. et a!.; Brumbaugh, Graves, 
Donohue & Raymond, 30 Rockefeller Plaza, New York, 
NY 10112 (US). 



(81) Designated States: AU, CA, JP, European patent (AT, BE, 
CH, DE, DK. ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, 
PT, SE). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: DETECTION OF MAMMARY TUMOR VIRUS-LIKE SEQUENCES IN HUMAN BREAST CANCER 
(57) Abstract 

The present invention relates to materials and methods for diagnosing breast cancer in humans. It is based, at least in part, on the 
discovery that a substantial percentage of human breast cancer tissue samples contained nucleic acid sequences corresponding to a portion 
of the mouse mammary tumor virus env gene. In contrast, such sequences were absent in almost all other human tissues tested. 



* (Referred to in PCT Gazette No. 31/1997. SecUon II) 

BNSDOCID:<WO 9717470A1> 



\ 



* 



FOR THE PURPOSES OF INFORMATION ONLY 

Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AM 


Armenia 


AT 


Austria 


AU 


Australia 


BB 


Barbados 


BE 


Belgium 


BF 


Burkina Faso 


BG 


Bulgaria 


BJ 


Benin 


BR 


Brazil 


BY 


Belarus 


CA 


Canada 


CF 


Central African Republic 


CC 


Congo 


CH 


Switzerland 


CI 


Cote d'lvoire 


CM 


Cameroon 


CN 


China 


cs 


Czechoslovakia 


cz 


Czech Republic 


DE 


Germany 


DK 


Denmark 


EE 


Estonia 


ES 


Spain 


FI 


Finland 


FR 


France 


GA 


Gabon 



GB 


United Kingdom 


GE 


Georgia 


GN 


Guinea 


GR 


Greece 


HU 


Hungary 


IE 


Ireland 


IT 


Italy 


JP 


Japan 


KE 


Kenya 


KG 


Kyrgystan 


KP 


Democratic People's Republic 




of Korea 


KR 


Republic of Korea 


KZ 


Kazakhstan 


LI 


Liechtenstein 


LK 


Sri Lanka 


LR 


Liberia 


LT 


Lithuania 


LU 


Luxembourg 


LV 


Latvia 


MC 


Monaco 


MD 


Republic of Moldova 


MG 


Madagascar 


ML 


Mali 


MN 


Mongolia 


MR 


Mauritania 



MW 


MaJawi 


MX 


Mexico 


NE 


Niger 


NL 


Netherlands 


NO 


Norway 


NZ 


New Zealand 


PL 


Poland 


PT 


Portugal 


RO 


Romania 


RU 


Russian Federation 


SD 


Sudan 


SE 


Sweden 


SG 


Singapore 


SI 


Slovenia 


SK 


Slovakia 


SN 


Senegal 


sz 


Swaziland 


TD 


Chad 


TG 


Togo 


TJ 


Tajikistan 


TT 


Trinidad and Tobago 


UA 


Ukraine 


UG 


Uganda 


US 


United States of America 


UZ 


Uzbekistan 


VN 


Viet Nam 



BNSDOCID:<WO 9717470A1> 



WO 97/17470 



PCT/US96/17877 



Description 

Detection Of Mammary Tumor Virus-Like 
Sequences In Human Breast Cancer 

Cross-Reference to Related Application 

This application is a continuation-in-part 
application of U.S. Serial No. 08/555,394, filed 
November 9, 1995. 

Statement Regarding Federally Sponsored Research 
5 This invention was made with funds from the U.S. 

government, which has certain rights in the invention. 

Introduction 

The present invention relates to materials and 
methods for diagnosing breast cancer in humans. It is 

10 based, at least in part, on the discovery that a 

substantial percentage of human breast cancer tissue 
samples contained nucleic acid sequences corresponding 
to a portion of the mouse mammary tumor virus env gene. 
In contrast, such sequences were absent in almost all 

15 other human tissues tested. 

Background of the Invention 

A large body of information has accumulated about 
the molecular biology of MMTV (reviewed in Slagle, 
B.L. et al., 1987, in "Cellular and Molecular Biology 

20 of Mammary Cancer", Kidwell.et al., eds., Plenum Press, 
NY. pp 275-306) . Mouse mammary tumor virus (MMTV) is 
associated with a high incidence of breast cancer in 
certain strains of mice (over 90% among females) , and 
has been regarded as a potential model for human 

25 disease. 

The MMTV virus does not carry a transforming 
oncogene, but rather acts as an insert ional mutagen 
with several proviral insertion loci designated int-1 
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or wnt-1 (Nusse R* et al., 1982, Cell 31:99-109) int-2 
(Peters, G. et al. # 1983 , Cell 33:369-377) int-3 
(Gallahan, D. et al. f 1987, J. Virol. 61:218-220) int-4 
(Roelink, H. et al. , 1990, Proc. Natl. acad. Sci. USA 
5 87:4519-4523) and int-5 (Morris, V.L., et al. 1991, 
Oncogene Research 6:53-63), which encode for growth 
factors or other related proteins. These genes are not 
expressed in normal mammary tissue but become activated 
after integration of MMTV provirus into the adjacent 
10 chromosomal DNA. 

The human homolog of the int-2 locus has been 
located on chromosome 11 (Casey, G. et al., 1986, 
Mol. Cell Biol. 6:502-510) and has been found amplified 
(in 15% of the breast cancers) and also expressed 
15 (Lidereau, R. et al., 1988, Oncogene Res 2:285-291; 

Zhou, D.J. et al., 1988, Oncogene 2:279-282; Liscia, 
D.S. et al., 1989, Oncogene 4:1219-1224; Meyers, S.L. 
et al., 1990, Cancer Res 50:5911-5918). It may be 
significant that in tumors from Parsi women, who have a 
2 0 high incidence of breast tumors, the int-2 locus is 
amplified in 50% of the cases ( Barnabas-Sohi , N. et 
al., 1993, Breast Dis. 6:13-26). The amplification of 
int-2 and other genes in llql3 is indicative of poor 
prognosis (Schuwring, E. et al., 1992, Cancer Research 
25 52:5229-5234; Champeme, M-H, et al., 1995, Genes, 

Chromosomes and Cancer 12:128-133). Both mouse and 
human int-2 have been sequenced (Moore, R. et al. , 
1986, EMBO J 5:919-924). The gene encodes a protein of 
about 27 kilodaltons (KD) which shows homology to both 
30 basic and acidic fibroblast growth factors (Dickson, C. 
et al. 1987, Nature (London) 326:833). 

However, efforts to demonstrate the presence of 
viruses in human breast cancer through search for viral 
particles, immunological cross-reactivity, or sequence 
35 homology have yielded contradictory results. Detect- 
able MMTV env gene-related antigenic reactivity 
has been found in tissue sections of breast cancer 
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(Mesa-Tejada et al., 1978, Proc. Natl* Acad. Sci . USA 
75: 1529-1533; Levine, P. et al. , 1980, Proc. Am. Assoc. 
Cancer Res. 21:170; Lloyd, R . et al., 1983, Cancer 
51:654-661), breast cancer cells in culture (Litvinov, 
5 S.V. and Golovkina, T.V., 1989, Acta Virologica 33:137- 
142), human milk (Zotter S. et al., 1980, Eur. J. 
Cancer 16:455-467) in sera of patients (Day, N.K. 
et al., 1981, Proc. Natl. Acad. Sci. USA 78:2483-2487), 
in cyst fluid (Witkin, S.S. et al. , 1981, J. Clin. 

10 Invest. 67:216-222) and in particles produced by a 

human breast carcinoma cell line (Keydar, I. et al., 
1984, Proc. Natl. Acad. Sci. USA 81:4188-4192). 
Sequence homology to MMTV has been found in human DNA 
under low stringency conditions of hybridization 

15 (Callahan, R. et al., 1982, Proc. Natl. Acad. Sci. USA 
79:5503-5507) and RNA related to MMTV has been detected 
in human breast cancer cells (Axel, R. et al., 1972, 
Nature 235:32-36). The presence of MMTV related 
sequences in lymphocytes from patients with breast 

20 cancer has been reported (Crepin, M. et al., 1984, 

Biochem. Biophys. Res. Comm. 118:324-331), as well as 
detection of reverse transcriptase (RT) activity in 
their monocytes (Al-Sumidaie , A.M. et al., 1988, Lancet 
1:5-8). May and Westley (May and Westley, 1989, Cancer 

25 Research 49:3879-3883) have reported the presence of 

MMTV- like sequences arranged as tandem repeats only in 
DNA from breast cancer cells. 

These results have been difficult to interpret, 
and theories linking MMTV or a related virus with human 

30 breast cancer have fallen out of favor, in view of the 
relatively recent discovery of human endogenous 
retroviral sequences ("HERs"; Westley, B. et al., 1986, 
J. Virol. 60:743-749; Ono, M. et al., 1986, J. Virol. 
60:589-598; Faff, O. et al., 1992, J. Gen. Virology 

35 73:1087-1097). Data which could be interpreted to 
demonstrate the presence of MMTV-related sequences 
could be more readily explained by endogenous human 
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retroviral sequences- Adding further confusion to the 
picture, env-gene related antigenicity has been 
detected in epitopes of human proteins (Hareuveni, M. 
et al., 1990, Int. J- Cancer 46:1134-1135). 

5 Brief Summary of the I nvention 

The present invention relates to methods for 
diagnosing breast cancer in humans in which the 
presence of mouse mammary tumor virus env gene-like 
sequences bears a positive correlation to the existence 

10 of malignant breast disease. It is based, at least in 
part, on the discovery that 38 to 40 percent of human 
breast cancer tissue samples tested contained gene 
sequences homologous to the mouse mammary tumor virus 
env gene that are substantially absent from other human 

15 tumors and tissues. The invention also relates to 

methods for diagnosing breast caner in humans in which 
the presence of retrovirus proviral fragments substan- 
tially homologous to the env gene and/or 3' LTR 
sequence of MMTV are detected. The molecular probes 

20 used in these experiments were designed to avoid cross- 
hybridization with endogenous human retroviral 
sequences. The present invention further provides for 
compositions of molecular probes which may be utilized 
in such diagnostic methods. 

2 5 Brief Description of the F igures 

FIGURE 1 : Amplification of 660 bp of MMTV-like 
env gene. DNA was extracted from frozen tissues. PCR 
was performed using primers 1 and 3. A: 2% agarose 
gel electrophoresis. B: Southern blot hybridization 

30 using 5 9 32 P-end-labeled probe 2. Lanes 1 and 3: breast 
cancer; lanes 2 and 4: normal breast; lane 5: control 
reaction (no DNA); lane E: MMTV env gene. M: molecular 
weight marker. Arrow indicates 510 bp band. 

FIGURE 2 : Nested PCR. A: 2% agarose gel electro- 

35 phoresis. 1: Amplification of 686 bp of MMTV-like env 
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gene sequences using primers 1 and 4 and the product of 
reaction A 1 as template, 2: Amplification of 250 bp 
of MMTV-like env gene sequences using primers 2 and 3. 
B, 1 and 2: Southern blot hybridization of the ampli- 
5 fied products using probe 5'- 32 P end-labeled probe 2a. 

FIGURE 3 : Amplification of 250 bp of MMTV-like 
env gene. DNA was extracted from paraffin-embedded 
tissue sections. PCR was performed using primers 2 and 
3. A: 2% agarose gel electrophoresis. B: Southern blot 

10 hybridization using 5'~ 32 P-labeled probe 2a. Lane 1: 
normal breast; lanes 2 to 5: breast cancer; lane E: 
MMTV env gene. M: molecular weight marker. Arrow 
indicates 298 bp band. 

FIGURE 4 : Nucleotide sequence of the cloned MMTV 

15 env gene-like sequences as compared to the env 

sequences of the GR and BR6 strains of MMTV using the 
GCG program. *:potential glycosy lat ion site, |:mismatch 
to MMTV. 

FIGURE 5 : Southern blot hybridization of genomic 
20 DNA. DNA was extracted from frozen tissues or cell 
lines, digested with EcoRl and transferred to 
nitrocellulose paper. Hybridization with 32 P-labeled 
clone 166. DNA from A, B, and G: env gene positive 
breast cancer; C and D: env negative breast cancer; 
25 E and F: normal breast; H:MCF-7 cells. M: molecular 
weight marker, Arrow indicates 9kb band. 

FIGURE 6 : Southern blot hybridization of genomic 
DNA. Experimental conditions as in Fig. 5. DNA from 
A and B: env negative breast cancer; C and D: env 
30 positive breast cancer; E: molecular weight marker 

(non-labelled) ; F. to H: normal breast. Arrow indicates 
position of 9 kb marker. 

FIGURE 7 : Map of MMTV. 

FIGURE 8 ; Comparison of the nucleic acid sequence 
3 5 of mouse mammary tumor env gene ("MMTENV" ) , showing 

residues 976-1640, with the nucleic acid sequence of a 
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representative 660 bp sequence obtained by PCR reaction 
of DNA from human breast cancer tissue ("MS1627" ) . 

FIGURE 9 ; Sequence of an about 2.6 kb MMTV-like 
fragment detected in a human breast carcinoma. 

5 Detailed Description of the Invention 

The present invention relates to methods and 
compositions for diagnosing breast cancer in humans. 

The present invention provides for compositions 
comprising an isolated and purified nucleic acid 

10 molecule which (i) hybridizes to a gene of mouse 
mammary tumor virus; (ii) is present in at least 
2 0 percent of DNA samples prepared from breast cancer 
tissue of different human subjects; and (iii) is 
present in less than 5 percent of DNA samples prepared 

15 from tissues other than breast cancer tissue from 

different human subjects. A "gene of mouse mammary 
tumor virus" includes, but is not limited to, the gag . 
Pol . and env genes and the 5' LTR and 3' LTR sequences 
of MMTV. In preferred embodiments of the invention, 

20 the mouse mammary tumor virus (hereafter "MMTV") gene 
is the env gene and/or the 3' LTR sequence. The term 
"hybridize" is used to refer to routine DNA-DNA or DNA- 
RNA hybridization techniques under what would be 
regarded, by the skilled artisan, as stringent 

2 5 hybridization conditions. The phrase "is present" 

indicates that a native form of the molecule, in an 
unpurified state (for example, as part of chromosomal 
DNA) , may be detected by a standard laboratory 
technique, such as Southern blot or polymerase chain 
30 reaction (PCR). To be "present", the molecule may be 
detectable by one technique but not others. To be 
present in "less than 5 percent of DNA samples prepared 
from tissues other than breast cancer tissue from 
different human subjects", all non-breast cancer tissue 

3 5 samples are considered together, but the total number 

of samples must be large enough to give the 5 percent 
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value statistical significance that would be reasonable 
to the skilled artisan. 

In order to identify such a nucleic acid molecule, 
the sequence of MMTV may be compared, using a computer 
5 database, to known human DNA sequences, and portions 
of MMTV which are less than or equal to 2 5 percent 
homologous to a human sequence may be selected for 
further study. The term "homologous", as used herein, 
refers to the presence of identical residues; for 

10 example, a first sequence is considered 25 percent 

homologous to a second sequence if it shares 2 5 percent 
of the residues of the first sequence. Since there is 
relatively greater likelihood that MMTV may bear 
similarity to human retroviral-like sequences, it may 

15 be preferable to evaluate whether a particular MMTV 

nucleic acid sequence is homologous to such sequences, 
for example, as endogenous human retrovirus sequences. 
A prototype of such viruses is HERV-K10 (Ono, M. 
et al., 1986, J- Virol. 60:589-598). 

20 Once an MMTV gene sequence which is less than or 

equal to 25 percent homologous to a human DNA sequence, 
such as a human endogenous retroviral sequence, is 
identified, the presence of nucleic acid molecules 
having the MMTV gene sequence in human breast cancer 

25 tissues and other tissues may be evaluated. Such 

evaluations may be performed either by Southern blot 
techniques, or, preferably, by polymerase chain 
reaction (PCR) techniques, which are more sensitive. 
In such a way, MMTV gene sequences which (i) hybridize 

30 to at least 20 percent of DNA samples prepared from 
breast cancer tissue of different human subjects and 
(ii) hybridize to less than 5 percent of DNA samples 
prepared from human tissues other than breast cancer 
tissues may be identified. A nucleic acid molecule 

35 having a MMTV gene sequence which satisfies these 

requirements may then be used in diagnostic methods 
which detect the presence of such sequence in human 



BNSOOCID: <WO 971 7470A1 > 



WO 97/17470 PCTAJS96/17877 

8 



breast tissue by standard techniques , including PCR 
techniques which assay for the presence of the 
molecule , but also , where appropriate , Southern blot , 
Northern blot, or Western blot techniques, to name but 
5 a few* 

In preferred embodiments, the present invention 
relates to a portion of MMTV localized between MMTV env 
gene sequences 976 and 1640 (Majors, I.E. and Varmus, 
H.E., 1983, J. Virol. 47:495-504; see Fig. 7). This 

10 about 660 bp sequence (hereafter, M the 660 bp 

sequence") has been found to exhibit low (16 percent) 
homology to the prototype human endogenous retrovirus 
HERV-K10, using the IBI/Pustell Sequence Analysis 
Program, and has also been shown to be present in 121 

15 (38.5%) of 314 unselected breast cancer tissue samples, 
in cultured breast cancer cells, in 2 of 29 breast 
f ibroadenomas (6.9%) and in 2 of 107 breast specimens 
from reduction mammoplasties (1.8%) . The sequence was 
not found in normal tissues including breast , lympho- 

20 cytes from breast cancer patients nor in other human 
cancers or cell lines (see example section , infra ) . 
Similarly, an about 250 bp sequence (hereafter "the 
250 bp sequence") , between positions 1388 and 1640 in 
the env gene, and therefore falling within the 660 bp 

25 sequence, was detected in 60 (39.7%) of 151 breast 

cancer , and in one of 27 normal breast samples assayed 
from paraffin-embedded sections. Cloning and sequenc- 
ing of the 660 bp and 250 bp sequences demonstrated 
that they are 95-99% homologous to MMTV env gene, but 

3 0 not to the known human endogenous retroviruses ("HERs" ) 
nor to other viral or human genes (<18%) . 

In another preferred embodiment, the present 
invention relates to a a nucleic acid molecule which 
corresponds to a retroviral genomic fragment which has 

35 substantial homology to 3' LTR and /or env gene of the 
MMTV genome, and is found in a substantial percentage 
of breast cancer samples. By substantial percentage is 
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meant at least 20% of tested breast cancer samples. 
Such a sequence is preferably comprised of the 3' LTR 
region and all or part of the env gene, although it may 
include more sequences of a retroviral genome. Most 
5 preferably, the sequence is at least comprised of an 
about 2.6 kb fragment which comprises the 1,228 base 
pair (bp) sequence of the 3' LTR sequence and 1,336 bp 
of the env gene sequence of MMTV (Fig. 9) (SEQ ID 
NO: 20) . When compared with the two strains of MMTV C3H 

10 and BR6, the sequence homology was 9 0.8% and 90.7%, 
respectively. When compared with the endogenous 
retroviral sequences (HUMERKA) , sequence homology was 
only 58% in 36 bp and 71% in 74 bp. 

Retrovirus proviral sequences can be detected by 

15 PCR technology using primers derived from the MMTV 

genome. Such primers include primer 5L, containing the 
nucleotides 7376-7395 of the MMTV BR6 genome (5'-3'f 
CCAGATCGCCTTTAAGAAGG) (SEQ ID NO: 11) and primer LTR 3 , 
containing nucleotides 9918-9927 of the MMTV BR6 genome 

2 0 (5' -3': CGAAC AG ACACAAAGCGACG ) (SEQ ID NO: 19). Other 

primers which correspond to or are homologous to MMTV 
sequences can be used as primers. Nucleotide fragments 
which correspond to or are homologous to the retroviral 
sequences isolated from the breast cancer samples can 

25 also be used to amplify additional retroviral fragments 
from the samples. Long PCR techniques can be used to 
amplify longer stretches of a proviral sequence. 

The present invention provides for compositions 
comprising an isolated and purified nucleic acid 

30 molecule which hybridizes to the about 2 . 6 kb 

retroviral fragment shown in Fig. 9 under stringent 
conditions or is at least 90 percent homologous to said 
fragment using the MacVector homology determining 
program which may be used to diagnose breast cancer in 

3 5 a subject, using methods which include PCR and Southern 

blot methods . 
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Nucleic acids having the 660 bp sequence, the 
250 bp sequence, or all or part of the about 2.6 kb 
sequence, may therefore be used, according to the 
invention, to diagnose breast cancer in a subject, 
5 using methods which include PGR and Southern blot 

methods . Where PCR methods are used , primers such as 
those listed in Table 1, below, may be utilized. 

The present invention provides for compositions 
comprising essentially purified and isolated nucleic 

10 acid having the 660 bp sequence or the 2 50 bp sequence 
or an at least five bp, and preferably greater than or 
equal to ten bp, subsequence thereof. In order to 
maintain the desired specificity , such nucleic acid 
molecules may preferably contain sequence falling 

15 within the 660 bp sequence, but preferably do not 
contain sequences from other portions of the MMTV 
genome, which may, undesirably, hybridize to human 
sequences which are not breast cancer specific , such as 
HERs . Accordingly , the present invention provides for 

2 0 compositions wherein the isolated and purified nucleic 

acid molecule comprises at least a portion having a 
nucleic acid sequence which hybridizes to a region of 
the mouse mammary tumor virus env gene between residues 
976 and 1640, or between residues 1388 and 1640, and 
25 wherein the isolated and purified nucleic acid molecule 
does not hybridize to any other region of the MMTV 
genome . 

The 660 bp sequence, in various embodiments, may 
have a number of nucleotide sequences . For example , in 

3 0 one embodiment, the 660 bp sequence may have a sequence 

as set forth in Fig. 8 and designated "MMTENV-like 
sequence" (SEQ ID NO: 17), which depicts the MMTV env 
sequence between residues 976 and 1640. In a second 
series of embodiments, the 660 bp sequence may have a 
3 5 sequence as set forth in Fig. 8 and designated "MS1627 M 
(SEQ ID NO: 18), which depicts a predominant sequence 
for the 660 bp sequence as it has been defined by 
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sequencing analysis of the products of PCR reactions 
using DNA from human breast cancer tissues. In still 
further embodiments, the 660 bp sequence may have 
various other nucleotide sequences obtained by 
5 sequencing the results of PCR reactions to detect the 
presence of 660 bp sequence in human breast cancer 
tissues . 

In related embodiments, the present invention 
provides for compositions comprising PCR primers 
10 that may be used to detect the presence of the 

f orementioned molecules or other MMTV-like sequences. 
For example, the compositions may comprise one or more 
of the following primer molecules (5' - 3'): 
CCTCACTGCCAGATC (SEQ ID N0:1); GGGAATTCCTCACTGCCAGATC 
15 (SEQ ID NO: 2); CCTCACTGCCAGATCGCCT (SEQ ID NO: 3); 

TACATCTGCCTGTGTTAC ( SEQ ID NO : 4 ) ; CCTACATCTGCCTGTGTTAC 
(SEQ ID N0:5); CCGCCATACGTGCTG (SEQ ID N0:6); 
ATCTGTGGCATACCT (SEQ ID NO: 7); GGGAATTCATCTGTGGCATACCT 
(SEQ ID NO:8); ATCTGTGGCATACCTAAAGG (SEQ ID NO:9); 
20 GAATCGCTTGGCTCG (SEQ ID NO: 10); CCAGATCGCCTTTAAGAAGG 
(SEQ ID NOrll); TACAGGTAGCAGCACGTATG (SEQ ID NO: 12); 
CGAACAGACACAAACACACG (SEQ ID NO: 19). 

The use of such compositions and molecules in PCR | 
and Southe rn blot techniques is illustrated in the non- 
25 limiting examples set forth below. The correlation 

between the presence of the MMTV-related nucleic acid 
molecules described above and breast cancer allows such 
molecules and compositions to be utilized in the 
diagnosis of breast cancer. Accordingly, the present 
30 invention provides for a method of diagnosing breast 
cancer, wherein the detection of such nucleic acid 
molecules bears a positive correlation to the existence 
of breast cancer in a human. The results of such 
evaluation, together with additional clinical symptoms, 
35 signs, and laboratory test values, may be used to 
formulate the complete diagnosis of the patient. 
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In further related embodiments, the present 
invention provides for an essentially purified peptide 
encoded by a nucleic acid molecule which (i) hybridizes 
to a gene of MMTV; (ii) is present in at least 
5 20 percent of DNA samples prepared from breast cancer 
tissue of different human subjects; and (iii) is 
present in less than 5 percent of DNA samples prepared 
from tissues other than breast cancer tissue from 
different human subjects- In preferred embodiments, the 

10 MMTV gene is the env gene. 

Such peptides may be used in the diagnosis of 
breast cancer. Accordingly, the present invention 
provides for a method of diagnosing breast cancer in 
a human subject, comprising detecting the presence of 

15 a peptide encoded by a nucleic acid molecule which 

(i) hybridizes to the env gene of a mouse mammary tumor 
virus; (ii) is present in at least 20 percent of DNA 
samples prepared from breast cancer tissue of different 
human subjects; and (iii) is present in less than 

20 5 percent of DNA samples prepared from tissues other 
than breast cancer tissue from different human 
subjects* 

The present invention also provides for antibodies 
(including monoclonal and polyclonal) antibodies which 

25 specifically bind to such peptides. Such antibodies may 
be used in methods of diagnosing breast cancer, for 
example, but not by way of limitation, by Western blot, 
immunof luorescent techniques, and so forth. 

In nonlimiting embodiments of the invention, the 

30 skilled artisan may evaluate MMTV-like nucleic acid 

molecules for regions which would be considered likely 
to encode immunogenic peptides (using, for example, 
hydropathy plots) . Such peptides may then be sequenced 
and used to produce antibodies that may be employed in 

3 5 diagnostic methods as set forth above. 

For example, certain peptides encoded by portions 
of the 660 bp sequence have been synthesized. These 



BNSDOCID: <WO 971 7470A1 > 



WO 97/17470 



13 



PCT/US96/I7877 



peptides, which have the sequences LKRPGFQEHEMI (SEQ ID 
NO:13) and GLPHLIDIEKRG (SEQ ID NO:14) / have been used 
to produce antibodies in rabbits, and the resulting 
antisera have successfully identified breast cancer 
5 cells positive for MMTV env-like sequences by PCR 

assay. Other peptides encoded by 660 bp sequence which 
may be useful according to the invention include 
TNCLDSSAYDTA (SEQ ID NO: 15) and DIGDEPWFDD (SEQ ID 
NO: 16) . 

10 6, Example: The Detection of Mouse Mammary Tumor 

Virus Env Gene-Like Sequences in Human Breast 
Cancer Cells and Tissues 



6.1. Materials and Methods 
DNA from breast cancer tissue and other human 

15 cancer tissues, human placentas, normal human tissues 
including breast, and from several human cell lines, 
(including eight breast cancer cell lines), and two 
normal breast cell lines was extracted following the 
procedure of Delli Bovi et al. (1986, Cancer Res. 

20 46:6333-6338). The DNA was resuspended in a solution 
containing 0-05 M Tris HC1 buffer, pH 7.8, and 0,1 mM 
EDTA, and the amount of DNA recovered was determined by 
microfluorometry using Hoechst 33258 dye (Cesarone, C. 
et al., 1979, Anal Biochem 100:188-197). Plasmids 

25 containing the cloned genes of MMTV were obtained from 
the ATCC, propagated in Escherichia coli cultures and 
purified using anion-exchange minicolumns (Qiagen) or 
by precipitation with polyethylene glycol (Sambrook J,, 
et al,, 1989, in "Molecular Cloning/ A Laboratory 

30 Manual", Cold Spring Harbor). Oligonucleotide primers 
were synthesized at the core facilities of the 
Brookdale Molecular Biology Center at Mount Sinai 
School of Medicine, 

Polymerase chain reaction (PCR) was performed 

3 5 using Taq polymerase following the conditions 

recommended by the manufacturer (Perkin Elmer Cetus) 
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with regard to buffer, Mg 2+ and nucleotide concentra- 
tions. Thermocycling was performed in a DNA cycler 
by denaturation at 94° C for 3 min. followed by either 
35 or 50 cycles of 94°C for 1.5 min., 50° C for 2 min. 
5 and 72 °C for 3 min. The ability of the PCR to amplify 
the selected regions of the MMTV env gene was tested by 
using as positive templates the cloned MMTV env gene 
and the genomic DNA of the MCF-7 cell line, since it 
was shown to express gp52 immunological determinants 
10 (Yang, N.S., et al., 1975, J. Natl. Cancer Inst. 

61:1205-1208). Optimal Mg 2+ , primer concentrations and 
requirements for the different cycling temperatures 
were determined with these templates. The master mix 
as recommended by the manufacturer was used. To detect 
15 possible contamination of the master mix components, a 
reaction without template was routinely tested. y DNA 
and control primers provided by the manufacturer were 
used as control for polymerase activity. As an 
internal control, amplification of a 120 bp sequence 
2 0 estrogen receptor gene was assayed using primers 

designed and generously provided by Dr. Beth Schachter, 
(Mount Sinai School of Medicine, N.Y.). In addition, 
primers for actin 5 gene amplification were also used. 
The product of the PCR was analyzed by electro- 
25 phoresis in a 2% agarose gel. A 1 Jcb DNA ladder (Gibco 
BRL) was used to identify the size of the PCR product. 
To determine if the amplified sequences of the middle 
region of the 660 bp faithfully reproduced the 
sequences of the env gene of MMTV, an 18-mer sequence 
30 within the env gene was used as a probe for the 660 bp 
amplified sequence. The 18-raer probe was 5' end- 
labeled with 32 P-ATP using T4 polynucleotide kinase and 
purified by the NENSORB nucleic acid purification 
cartridge (NEN) . Southern blot hybridization was 
35 performed using the conditions described by (Saiki 
et al.,1985, Science 230:1350-1354). 
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The product of the PCR (660 bp or 250 bp) was 
cloned directly from the reaction mixture into the TA 
cloning vector (Invitrogen) using the TA cloning kit 
and following the conditions recommended by the 
5 supplier* Direct cloning of the fragment isolated from 
the gel, was also performed. Plasmid DNA was purified 
by CsCl density gradient centrif ugation or by 
precipitation with polyethylene glycol (Sambrook 
et al., 1989, in "Molecular Cloning/ A Laboratory 

10 Manual", Cold Spring Harbor), restricted with Hindlll 
and EcoRl, electrophoresed in 2% agarose gels and 
transferred to nitrocellulose filters. Southern blot 
hybridization was carried out using a 5 '-terminal 
labeled internal probe as described above. Cloning 

15 procedures were performed in laboratories totally 

separate from those where PCR was carried out. Auto- 
mated DNA sequencing (using Applied Technology 
Sequencer Model 373A) was performed in the Brookdale 
Molecular Biology Center. Sequence homology was 

20 determined using the IBI MacVector GenBank and GCG 
Programs. 

To prevent contamination of the samples, process- 
ing of human tissues was performed in a laminar flow 
hood. DNA extractions were done in a chemical hood 

25 located in a different room from that where PCR was 

performed. PCR assays were assembled in a biological 
hood provided with ultraviolet light. Aerosol 
resistant tips and dedicated positive-displacement 
pipettes were used throughout. All equipment used for 

3 0 PCR (microcentrifuge, electrophoresis apparatus, 
pipettors) was cleaned each time with 10% sodium 
hypochlorite to assure DNA decontamination (Prince and 
Andrus, 1992, Biotechniques 12:358-36). After the 
initial experiments were performed, the plasmid con- 

3 5 taining the MMTV env gene was frozen and never used 
again, to avoid contamination. However, to detect 
plasmid contamination from our own env gene clones, 
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primers were designed to amplify plasmid sequences. 
All the authentic MMTV env positive samples were then 
tested and found negative for plasmid contamination. 

Southern blotting and hybridization were performed 
5 as described (Southern, E.M. , 1975, J. Mol. Biol. 

98:503-517), using the 660 bp cloned sequences labeled 
by the random primer procedure (Feinberg, A. P., et al. , 
1983, Anal. Biochem. 132:6-13). Prehybridization and 
hybridization were performed in a solution containing 

10 6 x SSPE, 5% Denhardt's, 0.5% SDS, 50% formamide, 

100 /ig/ml denaturated salmon testis DNA, incubated for 
18 hrs at 42 °C, followed by washings with 2 x SSC and 
0.5% SDS at room temperature and at 37 °C and finally in 
0.1 x SSC with 0.5% SDS at 68°C for 30 min (Sambrook 

15 et al., 1989, in "Molecular Cloning/A Laboratory 

Manual", Cold Spring Harbor). For paraffin-embedded 
tissue sections the conditions described by Wright and 
Manos (1990, in "PCR Protocols", Innis et al., eds., 
Academic Press, pp. 153-158) were followed using 

20 primers designed to detect a 250 bp sequence. 

6.2. Results 

6.2.1. Selection of Specific MMTV Env G ene Sequences 
A computer search for MMTV env gene homologous 
sequences was first performed, since sequence homology 

2 5 between the human endogenous retroviral sequences and 
MMTV had been described. The prototype of this group 
of human endogenous retroviruses is HERV-K10 (Ono, M. 
et al., 1986, J. Virol. 60:589-598). The sequences of 
the env gene of MMTV (Majors, I.E. and Varmus, H.E., 

30 1983, J Virol 47:495-504) were aligned with sequences 
of the env gene of the human endogenous retrovirus 
HERV-K10 (Ono, M. et al. , 1986, J. Virol. 60:589-598), 
using the IBI/Pustell Sequence Analysis Program. A 
region of 660 bp of low homology (16%) was localized 

35 between MMTV env gene sequences 976 and 1640 (Majors, 

I.E. and Varmus, H.E., 1983, J Virol 47:495-504). This 
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internal domain of the outer membrane of the env gene 
has only one glycosylation site and is highly conserved 
between strains* Two primers comprising 15 bp 
sequences at positions 976-990 (primer 1) and 1626-1640 
5 (primer 3) were first synthesized. Later longer 
primers were synthesized (IN and 3N) . An 18-mer 
sequence in the middle of the 660 bp MMTV env region 
(1388-1405) (primer 2) was used as a probe to identify 
the 660 bp sequence. A second oligomer probe was 

10 synthesized comprising the sequence 1554 to 1568 
(primer 2a) to be used for hybridization when a 
sequence of around 250 bp (between positions 1388 and 
1640) was amplified* For nested PCR reactions (Mullis, 
K.B. and Faloona, F.A., 1987, Meth Enzymol 155:335- 

15 350) , another primer comprising sequences 1647 to 1661 
(primer 4) was synthesized to be used with primer 1 in 
the first reaction and primers 2 and 3 in the second. 
Modified primers with GC clamps and extra sequences 
were also synthesized and used in the PCR (primers la 

20 and 3a) . Another set of primers comprising sequences 
974 to 1003 (5L) and 1558 to 1577 (3L) were subse- 
quently developed because their Tin's matched and 
provided better amplification than the original 
primers. The sequences are represented in Table 1. 

25 All of them were productive in amplification reactions. 
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Tabl 1. Primer and prob sequ nces and location 
in mouse mammary tumor virus env gene 

Designation Sequence (5 '-3') Location 

5 

1 CCTCACTGCCAGATC 976-990 

la GGGAATTCCTCACTGCCAGATC 976-990 

1N C CTCACTG C C AGAT CGCCT 976-993 

2 TACATCTGCCTGTGTTAC 1388-1405 
10 2 N CCTACATCTGCCTGTGTTAC 1386-1405 

2a CCGCCATACGTGCTG 1554-1568 

3 ATCTGTGGCATACCT 1640-1626 
3a GGGAATTCATCTGTGGCATACCT 1640-1626 
3N ATCTGTGGCATACCTAAAGG 1640-1621 

15 4 G AATCG CTTGG CT CG 1661-1647 

5L CCAGATCGCCTTTAAGAAGG 984-1003 

3L TACAGGTAGCAGCACGTATG 1558-1577 



6.2.2. Detection of MMTV-Like Env Gene 

Sequences in Human Breast Tu mor DNA 



20 



PCR was performed on DNA extracted from breast 
cancer tissues, normal breast tissues and from the 
plasmid containing the env gene of MMTV, using primers 
1 and 3. Photographs of the ethidium bromide stained 
gels of the PCR product reveal the presence of an 
25 approximately 660 bp sequence in some of the tumors, 
(Fig. 1A, lanes 1 and 3) but not in the normal tissue 
samples (Fig. 1A, lanes 2 and 4). As a positive con- 
trol the MMTV env gene was also amplified (Fig. 1A, 
lane E) . Similar results were obtained with modified 
30 primers la, 3a, 3L and 5L. Southern blot hybridization 
of the gel with 32 P-labeled 18-mer oligonucleotide 
(primer 2) indicated that this internal sequence was 
present in the amplified material (Fig. IB) and that 
the bands in the gel were not artif actual. 
35 our initial effort was to analyze a representative 

sample of breast cancer specimens as well as normal 
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tissues and other tumors. To date 343 breast tumors 
have been processed, DNA extracted and PCR preformed* 
Of these 343 tumors, 314 were carcinomas and 29 were 
fibroadenomas. Amplification of sequences of 660 bp 
5 was observed in 121 of the carcinomas (38.5%) and in 
2 of the 29 fibroadenomas (6.9%) . These sequences 
were confirmed to be MMTV env gene- like sequences by 
hybridization with the labeled specific probe 
containing the internal sequences. These sequences 

10 were not detected in the DNAs extracted from 20 normal 
organs, 23 cancers from other organs and 26 samples of 
blood lymphocytes including 7 from breast cancer 
patients whose breast specimens were positive. From 
107 samples of normal breast obtained from reduction 

15 mammoplasties, 2 were positive (1.8%). In addition to 
DNA from lymphocytes from seven positive patients, DNA 
from their normal breast tissue of the operated breast 
was tested in 4 cases. All were negative (Table 2) . 
Finally, DNA of the MCF-7, and ED (a cell line 

20 developed in our laboratory from the pleural effusion 

of a patient with an env -positive breast tumor) breast 
cancer cell lines were shown to contain the 660 bp MMTV 
env gene-like sequences (Table 3) , while four other' 
breast cancer cell lines were positive only for the 

25 250 bp sequence (T47-D, BT-474, BT-20 and MDA-MB-231). 
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Table 2 



Detection of MMTV env g n -like 
sequences in human DNA xtract d 
from fresh or frozen tissues 



Sample 



MMTV env gene 
Number sequences 



% Positive 



Breast Carcinomas 314 

Breast Fibroadenomas 2 9 

10 Normal Breasts 107 

♦Normal Breasts 4 

Tumors other than 

breast 2 3 

Normal tissues 2 0 

15 Lymphocytes 2 6 

* *Lymphocy tes 7 



121 
2 
2 

negative 

negative 
negative 
negative 
negative 



38.5% 
6.9% 
1.8% 



20 



Histologically normal tissue from same breast 
as positive cancer. 



Lymphocytes from breast cancer patients who were 
positive for MMTV env gene sequences in the tumor. 
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Tabl 3. Det ction of MMTV nv g ne-like sequences 
in DNA from human cell lines in cultur 



Human Cell Lines MMTV env gene sequence 

(breast carcinoma) positive 



5 MC-7 
T47-D 
BT-2 0 

MDA-MB-2 31 
ZR-75-1 
10 SK-BR 3 

BT474 
ED 

MCF-10 
HB-447 
15 HL-60 
K562 
Jurkat 
Hep 6-2 



it 
it 
it 
ii 
ti 
ii 



(normal breast) 
ii ii 

(promyelocytic leukemia) 
( ery throleukemia ) 
(T cell leukemia) 
(hepatoma) 



negative 
negative 
negative 
negative 
negative 
negative 
positive 
negative 
negative 
negative 
negative 
negative 
negative 



The nested polymerase reaction was used in several 

20 instances to increase sensitivity and specificity, thus 
reducing the probability of false positives. In 
Fig, 2, results of a representative nested reaction 
are shown using primers 1 and 4 in the first reaction 
(Fig, 2A) and 2 and 3 for the 2nd reaction. The 

25 specificity of the reaction can be seen in the 2nd 
amplification (Fig. 2B) . 

To study a large number of samples and to be able 
to perform archival studies, PCR of paraffin-embedded 
tissue sections was also carried out. Primers 2 and 3 

30 were used to amplify a 250 bp sequence within the 

660 bp stretch when DNA was extracted from paraffin- 
embedded tissue sections since larger size sequences 
are difficult to amplify after fixation. Tumor DNA was 
amplified (Fig. 3A, lanes 2-5) whereas normal breast 

35 DNA was not (Fig. 3A, lane 1) . The identification of 
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this 250 bp sequence with the MMTV-like ejw gene was 
confirmed by hybridization with an internal probe 
(primer 2a) as shown in Fig. 3B. Using this procedure 
we have analyzed 151 breast cancer samples and found 
5 that 60 (39.7%) possess the 250 bp sequence. Of the 
27 normal breast samples obtained from reduction 
mammoplasties assayed by this procedure, one was 
positive (3.7%). These results, in conjunction with 
those obtained from lymphocytes and from normal breast 
10 tissue of patients whose breast cancer was PCR 

positive, indicate that MMTV-like sequences are present 
in a significant number of human breast cancer DNA 
which cannot be explained by DNA polymorphism. 

6.2.3. Cloning and Sequencing of the 
15 MMTV— Like Env Gene Sequences 

To find out whether there was homology to MMTV eny 

gene throughout the whole 660 bp stretch, the product 

of the PCR from 8 different tumors was cloned and 

sequenced. In Fig. 4 the sequence of different clones 

20 comprising around 600 bp are represented, as aligned to 

the MMTV env gene sequence of the GR and BR6 strains 

(Redmon, S. and Dickson, C, 1983, EMBO J. 2:125-131). 

This domain of the env gene in the GR strain is 100% 

homologous to the C 3 H strain and 98% to the BR6 strain 

25 (Majors, I.E. and Varrous, H.E., 1983, J. Virol. 47:495- 

504; Moore, R. et al. , 1987, J. Virol. 61:480-490). 

Evaluation of the clones indicated that homology to 

MMTV env gene varied from 95% to 99%. Another seven 

clones comprising only 250 bp were also sequenced. 

30 Homology to MMTV eny gene varied from 95% to 99% (data 

not shown) . When compared to the human endogenous 

provirus HERV-K10, the homology of all the clones was 

less than 15%. When compared against all known viral 

and human genes (more than 130,000 entries) using the 

35 1B1 MacVector GenBank and GCG programs, the highest 

homology recorded was 18%. 
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6,2.4 . Southern Blot Analysis 
Using Cloned Sequences 

To investigate whether the env gene-like sequences 

were present in human DNA, Southern blot hybridization 

5 was performed using the cloned sequence as probe. DNAs 

from normal breast tissues, env positive or negative 

breast tumors, tumors other than breast and breast 

cancer cell lines were restricted with EcoRI and in 

some instances with Pstl, Bglll or Kpnl. EcoRI is a 

10 frequent cutter restriction enzyme that digests MMTV 

proviral DNA between env and pol genes. Four different 
cloned 660 bp sequences were used as probes after 
labeling with 32 P by random prime-labeling. Results of 
some of the Southern blot hybridization experiments are 

15 shown in Fig. 5. They reveal the presence of a labeled 
restriction fragment migrating at approximately 7-8 kb 
in breast cancer DNA, in ED and two fragments in MCF-7 
cells. Different restriction patterns were observed 
with the other three enzymes. The 660 bp sequence 

20 was absent in 10 normal tissues, 10 fibroadenomas and 
10 tumors from other tissues. It is important to 
emphasize that hybridization conditions for these 
experiments were stringent (as described in Section 
6.1) to avoid interference with endogenous sequences 

25 that might interact with the probes. 

7. Example: Detection of a Retrovirus 
Proviral Fragment in Human 
Breast Cancer Cells and Tissues 

7.1. Materials and Methods 

30 To detect longer retrovirus proviral fragments in 

breast cancer samples, DNA was extracted from breast 

cancer carcinoma tissue samples as described above in 

Section 6.1. Two rounds of long PCR was performed on 

the DNA primers 5L (SEQ ID NO: 11) and LTR3 (SEQ ID 

35 N0:19). The primer 5L contains nucleotides 7370-7395 

of the MMTV BR6 genome (5' -3': CCAGATCGCCTTTAAGAAGG) 

(SEQ ID NO: 11) and primer LTR3 contains nucleotides 



BNSDOCID:<WO 9717470A1> 



WO 97/17470 POYUS96/17877 

24 



9918-9927 of the MMTV BR6 genome (5'-3': 
CGAACAGACACAAAGCGACG) (SEQ ID NO: 19). Long PCR was 
performed using protocols described by the manufacturer 
(Perkin Elmer, Foster City, CA) . The amplified 
5 retroviral fragment isolated from the breast cancer 
sample was cloned into the TA cloning vector 
(Invitrogen) and automated sequencing was performed 
as described in Section 6.1. 

7 . 2 Results 

10 An approximately 2.6 kb retroviral fragment 

containing 1,228 bp of the 3' LTR sequence and 1,336 bp 
of the env gene sequence of a potential provirus was 
detected in a human breast carcinoma tissue sample by 
the long PCR technique using the 5L and LTR3 primers. 

15 The sequence of this retroviral fragment is shown in 
Fig. 9. (SEQ ID NO:20). 

When compared with the two strains of MMTV C3H 
and BR6 , the sequence homology was 90.8% and 90.7%, 
respectively, over the MMTV genomic fragment from 

20 nucleotides 7370-9937. When compared with the 

endogenous retroviral sequences (HUMERKA) , sequence 
homology was only 58% in 36 bp and 7 1% in 74 bp. 

8 . Discussion 
Search for virus-related sequences in human breast 

25 cancer has been hampered by great variation reported 
in previous studies, by the presence of endogenous 
retroviral sequences in human DNA and by the lack of 
sensitivity of the methods employed. The studies 
reported herein circumvent these deficiencies by 

30 focusing on sequences with low homology to human 
endogenous retroviruses, by investigating a large 
number of tumors and several types of controls and 
by using the most sensitive technology presently 
available. 

35 The results indicate that unique MMTV env g ne 

sequences were present in 38.5% of the breast cancer 
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samples analyzed and 39.7% of archival samples of 
breast cancer and that these sequences were absent in 
normal tissues including lymphocytes from patients with 
positive breast cancer and in cancers other than 
5 breast. Normal breast tissue and fibroadenomas had a 
low frequency (1.8 to 6,9%) of positive results. When 
cloned and sequenced/ the sequences were found to be 
highly homologous to MMTV env gene, but not to the 
endogenous retroviral sequences. Furthermore, 

10 experiments in which the cloned amplified sequences 

were used for hybridization with DNA from breast cancer 
or normal tissues revealed that homologous DNA was only 
present in breast cancer DNA. The results also 
indicate that a human breast carcinoma sample contained 

15 an about 2.6 kb MMTV-like fragment comprised of 1,3 36 
bp of the env gene and 1,228 bp of the 3' LTR. 

The detection of MMTV env gene sequences in two. 
fibroadenomas out of 29 and in two normal breast tissue 
samples out of 107 samples is of uncertain signifi- 

20 cance. Although such results could potentially be 

artifactual, and thus may represent false positives,,, 
they may alternatively indicate the presence of 
histologically unrecognized cells that were or will be 
neoplastic. 

25 Ninety percent (90%) of the breast cancers tested 

were invasive ductal carcinomas, which reflects the 
prevalence of this type of neoplasm. Most patients 
were node-positive which is probably artifactual since 
it was necessary that tumor size be sufficiently large 

30 to provide an aliquot for research and tumor size 
correlates with node positivity. 

It is unlikely that differences in homology 
between MMTV env gene and the cloned human sequences 
are generated by errors committed by the Taq 

3 5 polymerase. It has been estimated that the rate of 
nucleotide misincorporation is 1 x 10" 5 per cycle 
(Ehrlich et al, 1991, Science 252:1643-1651) and 
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therefore, only a total of 0.32 nucleotides 

mis incorporated should be expected in 660 bp after 

50 cycles. The differences in homology between clones 

from different patients is likely to represent 

heterogeneity of the env gene. 

In contrast to earlier, ambiguous data associating 
MMTV-like sequences with human breast cancer, we have 
clearly demonstrated the existence of such sequences in 
breast cancer cells which cannot be explained by any 
known human endogenous retroviral sequence. Our data do 
not support the results of earlier studies which 
indicated that, as in the mouse, MMTV-like sequences 
were found in lymphocytes from two patients with breast 
cancer (Crepin, M. et al., 1984, Biochem. Biophys. Res. 
Comm. 118:324-331). The absence of MMTV env-like 
sequences in lymphocytes could reflect the fate of a 
unique lymphocyte subset over decades between initial 
encounter and the appearance of clinical breast cancer; 
alternatively, the human disease may differ from the 
mouse model. Results from attempts to identify unique 
MMTV-like pol gene sequences have shown that they 
cannot be distinguished from the reverse transcriptase 
sequences of endogenous retroviruses (Deen, K.C. and 
Sweet, R.W., 1986, J. Virol. 57:422-432). 

The origin of the MMTV env gene-like and 3' LTR- 
like sequences found in tumor DNA could be the result 
of integrated MMTV-like sequences from a human mammary 
tumor virus. Polymorphism of endogenous retroviral 
sequences is conceivable but can be ruled out because 
these sequences were not detected in lymphocytes from 
the positive patients, in sections of the cancerous 
breast from which abnormal cells were absent, or in 
normal breast tissue from patients with MMTV env-like 
positive tumors. Recombination during tumorigenesis 
between endogenous sequences to resemble the MMTV env 
genes seems highly unlikely since no known gene or 
viral sequence is more than 18% homologous to the 
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660 bp sequence. The longer about 2.6 kb MMTV-like 
fragment detected in a human breast carcinoma had 
minimal homology (58% in 36 bp nd 71% in 74 bp) to 
endogenous human retroviral sequences. Thus, the most 
5 conservative interpretation is that our findings repre- 
sent exogenous sequences from an agent similar to MMTV. 
Recombination between endogenous and exogenous env gene 
sequences are known to accelerate the development of 
malignancies in mice (DiFronzo, N.L. and Holland, C.A., 

10 1993, J. Virol. 67:3763-3770). Whether the MMTV-like 
sequences belong to an entire acquired provirus or to 
an exogenous fragment integrated into endogenous 
sequences, is presently not known. Experiments are in 
progress to distinguish between these possibilities. 

15 Several genetic alterations have been identified 

in human breast cancer that can be useful as markers 
for prevention, detection or prognosis (reviewed in 
Runnenbaum, I. et al., 1991, Proc. Natl. Acad. Sci. USA 
88:10657-10661). The BRCA1 and BRCA2 genes have 

20 recently been described. They account for at least 

5% of breast cancer and are related to familial breast 
cancer (Miki, Y . et al., 1994, Science 266:66-71; 
Wooster, R. et al., 1994, Science 265:2088-2090)/ We 
have primary evidence that familial clustering of the 

25 MMTV env gene-like sequences occurs, accounting for an 
even higher percentage of cancers in affected families 
(Holland et al. 1994, Proc. Am. Assoc. Cancer Res 
35:218). The presence of MMTV-like sequences may be 
correlated with special clinical disease status, may 

30 provide another potential molecular marker, and may 

distinguish a subset of human breast cancer for which 
viral etiology is tenable. This has implications for 
epidemiology, therapy and prevention. 

Various publications are cited herein, the 

35 contents of which are hereby incorporated by reference 
in their entireties. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION 
(i) APPLICANT: HOLLAND, JAMES 



(ii) TITLE OF THE INVENTION: DETECTION OF MAMMARY TUMOR VIRUS-LIKE 

SEQUENCES IN HUMAN BREAST CANCER 

(iii) NUMBER OF SEQUENCES: 20 

(iv) CORRESPONDENCE ADDRESS : 

<A) ADDRESSEE : Brumbaugh , Graves, Donohue & Raymond 

( B ) STREET: 30 Rockefeller Plaza 

(C) CITY: New York 

( D ) STATE : NY 

( E ) COUNTRY : USA 

(F) ZIP: 10112-0228 

(v) COMPUTER READABLE FORM: 
<A) MEDIUM TYPE: Diskette 
<B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version 1.5 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: NOT YET ASSIGNED 
<B) FILING DATE: 08-NOV-1996 
(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER 08/555,394 

(B) FILING DATE: 09-NOV-1995 



(viii) ATTORNEY / AGENT INFORMATION: 

(A) NAME: Kole, Lisa B 

(B) REGISTRATION NUMBER: 35,22 5 

(C) REFERENCE /DOCKET NUMBER: 30363-PCT - 165/ 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 212-408-2628 

(B) TELEFAX: 212-765-2519 
<C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

( i i ) MOLECULE TYPE : cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



CCTCACTGCC AGATC 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GGGAATTCCT CACTGCCAGA TC 22 



15 (2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

2 0 <D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) H YPOTHET I CAL : NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

2 5 (vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCTCACTGCC AGATCGCCT 19 



5 



10 



3 0 (2) INFORMATION FOR SEQ ID NO: 4: "I 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

3 5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

4 0 (vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TACATCTGCC TGTGTTAC 18 



(2) INFORMATION FOR SEQ ID NO: 5: 

4 5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

5 (vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCTACATCTG CCTGTGTTAC 20 



(2) INFORMATION FOR SEQ ID NO: 6: 

10 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
2 0 ( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CCGCCATACG TGCTG 15 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

2 5 (A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

3 0 (iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

3 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATCTGTGGCA TACCT 15 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 23 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

4 5 (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGGAATTCAT CTGTGGCATA CCT 23 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
10 (iii) HYPOTHETICAL: NO 

( iv ) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATCTGTGGCA TACCTAAAGG 20 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

2 5 (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: ' 
(ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

3 0 GAATCGCTTG GCTCG 15 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
40 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
< ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCAGATCGCC TTTAAGAAGG 20 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

( B ) TYPE : nucleic acid 

( C ) STRANDEDNE S S : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 
10 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
TACAGGTAGC AGCACGTATG 



20 



15 



20 



25 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSEz NO 

<v) FRAGMENT TYPE: N-terminal 
(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Leu Lya Arg Pro Gly Phe Gin Glu His Glu Met He 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 14: 

3 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Gly Leu Pro His Leu lie Asp He Glu Lys Arg Gly 
1 5 10 



45 



(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: lin ar 
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(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 
<vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Thr Asn Cys Leu Asp Ser Ser Ala Tyr Asp Thr Ala 
15 10 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: . N-terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Asp lie Gly Asp Glu Pro Trp Phe Asp Asp 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 662 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE : 

(vi) ORIGINAL SOURCE: 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TCCTCACTGC CAGATCGCCT TTAAGAAGGA CGCCTTCTGG GAGGGAGACG AGTCTGCTCC 60 

TCCACGGTGG TTGCCTTGCG CCTTCCCTGA CCAAGGGGTG AGTTTTTCTC CAAAAGGGGC 120 

CCTTGGG TTA CTTTGGGATT TCTCCCTTCC CTCGCCTAGT GTAGATCAGT CAGATCAGAT 180 

TAAAAGCAAA AAGGATCTAT TTGGAAATTA TACTCCCCCA GTCAATAAAG AGGTTCATCG 240 

ATGGTATGAA GCAGGATGGG TAGAACCTAC ATGGTTCTGG GAAAATTCTC CTAAGGATCC 300 

CAATGATAGA GATTTTACTG CTCTAGTTCC CATACAGAAT TGTTTCGCTT AGTTGCAGCC 360 

TCAAGATATC TTATTCTCAA AAGGCAGGAT TTCAGGAACA TGAGATGATT CCTACATCTC 420 

TGTGTTACTT ACCCTTATGT CATATTATTA GGATTACCTC AGCTAATAGA TATAGAGAAA 480 

GAGGATCTAC TTTTCATATT TCCTGTTCTT CTTGTAGATT GACTAATTGT TTAGATTCTT 540 

CTGCCTACGA CTATGCAGCG ATCATAGTCA AGAGGCCGCC ATACGTGCTG CTACCTGTAG 600 

ATATTGGTGA TGAACCATGG TTTGATGATT CTGCCATTCA AACCTTTAGG TATGCCACAG 660 

AT 662 



(2) INFORMATION FOR SEQ ID NO:l6: 



( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 663 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TCCTCACTGN CAGATCGCCT TTAAGAAGGA CGCCTTCTGG GAGGGAGACG AGTCTGCTCC 60 

TCCACGGTGG TTGACTTGCG CCTTCCCTGA CCAGGGGGTG AGTTTTTCTC CAAAAGGGGC 120 

CCTTGGGTT A CTTTGGGATT TCTCCCTTCC CTCGCCTAGT GTAGATCAGT CAGATCAGAT 180 

TAAAAGCAAA AAGG AT CT AT TTGGAAATTA TACTCCCCCT GTCAATAAAG AGGTTCATCG 240 

ATGGTATGAA GCAGGATGGG TAGAACCTAC ATGGTTCTGG GAAAATTCTC CTAAGGATCC 300 

CAATGATAGA GATTTTACTG CTCTAGTTCC CATACAGAAT TGTTTCGCTT AGTTGCAGCC 360 

TCAAGATATC TTATTCACAA AAGGCAGGAT TTCAAGAACA TGACATGAAT CCCTACATCT 420 

CTGTGTTACT TACCCTTATG CCANANTATT AGGATTACCT CAGCTAATAG ATATAGAGGA 480 

AGAGGATCTA CTTTTCATAT TTCCTGTTCT TCTTGTAGAT TGACTAATTG TTTAGATTCT 540 

TCTGCCTACG ACTATGCAGC GATCATAGTC AAGAGGCCGC CATACGTGCT GCTACCTGTA 600 

GATATTGGTG ATGAACCATG GTTTGATGAN NCTGCCANTC AAACCTTTAG GTATNCCACA 660 

663 

GAT 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
< iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CGAACAGACA CAAACACACG 



<2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2598 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
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(xi) SEQUENCE DESCRIPTION: 

CGAACAGACA CAAACACACG AGAGGTGAAT 
CAGCACTCTT TTATATCATG GTTTACATAA 
AAGAACATAG GAGAATAGAA CACTCAGAGC 
5 TCAGGAAACC ACTTGTCTCA CATCCTTGTT 

TAAACCTTGG GAACCGCAAN GTTGGGCTCA 
ATTATCTGCA GAAATGTGTT CCTAATTGTC 
AAATCTTTCC CCCAACGTTC ATCCCACTCC 
TTTATGTCGT CTTTTTCTTC CTGAGTTAAC 

10 CTTTCACGAA AGGGGAGGGA TCTGTACAAC 

TCGAAATTTA AATCGTATCT TCCTGTATAT 
TAAGTCCCGG TTGCCACCAC CTGTCTCCTA 
TTACTTCTAG GCCTGAGGCC CTTAGTCCTT 
TCTTTCTATT TTCTATTCCC ATTTCTAACC 

15 ACAAAGATTC ATTTCTTAAC ATCATGATTA 

TAAAAATATA ATTTTTAGCA AGCATTCTTA 
GCTTGTAANA GGAANTTGGC TGTGGTCCTT 
TGTTTAGATT GTAATCTTGC ACAGAAGAGT 
GAGCACGAAC CGCAACTTCC CCCAATAGCC 

20 AGCAGANAAT GAGTATGTCT TTGTCTGATG 

CCTTGGTGGG AAACAACCCC TTGGCTGCTT 
TCAACCATTT CTGCTGCAGG CGCGGCATTT 
TTAAGATCTG ACTGCACTTG GTCAAGGCTC 
ATCATAAGTA CTATGACCAA AAGCAGGGCT 

2 5 CTAATCCAAT GGATTTAAAG CCTTTACTCC 

TCCCTGCGTT CAAATTTTTT TGCTCNTATC 
GAAACTCATG TCTTCAAATG CCCAATAAAT 
CATTATACGG NANAGGTGTG ACACAGCATA 
ATTCTGGTCT TTAAGTTTGC CACATCTTGT 

3 0 TTAAGTCTAG CTTTCAATTT TAAGTCTATT 

TTTCTATGAA GATTATTAAC AAACGTAGCA 
GCTACAGCAA AGGAAGTGAT AATAGCAATT 
ACGAATCGCT TAGCTCGAAT TAAATCTGTG 
TCAAACCATG GTTCATCACC AATATCTACA 

3 5 ATGAATCGCT GCATATCCGT NGGCAAAAAA 

GGATTTGAAA NTTATNCCCC TTNCCCCNAA 

GTATCTANAA NAGGGCATAG GGGTAAGAAA 

AAATTCNGGG TTTGGGAGAA TAAGATTCTG 

TGGGGAATAG AGCAGTAAAA TCTCTATCAT 

4 0 CCAAGTAGGT TCNAACCCAT CNTGCTTCAT 

GAGTATAATT TCCAAATAGA TCCTTTTTGT 
GCGGGGGAAG GGAGAAATCC CAAAGTAACC 
CCTGGTCAGG GAAGGCGCAA GGCAACCACC 
AGGCGTCCTT CTTAAAGGCG ATCTGGAGGA 
4 5 TTCTTAAAGG CGATCTGG 



SEQ ID NO: 20: 

GTTAGGACTG TTGCAAGTTT ACTCAAAAAA 60 

GCATTTACAT AAGACTTGGA TAAGTTCCAA 120 

TTAGATCAAA ACATTTGATA CCAAACCAAG 180 

TTAAGAACAG TTTGTGACCC TGAACTTACT 240 

TAAAGGTTAT CCATTATAGC TCATGCCAAA 300 

TAGCCACTGC CCCCTCCCTT GGTATAATGA 360 

CCTAGATAAA TATAATCATG TACCTGTTGT 420 

ACACACCAAG GAGGTCTAGC TCTGGCGAGT 480 

ACTTTATAGC CGTTGACTGT GACCCACCTA 540 

GGTAGCGGGG CGTCTGTTGG TCTGTAGATG 600 

TTTTGACAAG CGTACTCCTC TTTCCCCTTT 660 

GCACCTGTTC TTCAACTGAG GTTGAGCGTC 720 

TTTGAATTTG AGTAAATATA GTGCTAAAAG 780 

ATAATCGACC TATTGGATTG GTCTTATTGG 840 

TTTCTATTTC TGAAGGACAA AGTCGGTGTG 900 

GCCCCACGAG GAAGGTCGAG TTCTCCGAAT 960 

TATTAAAAGA ATCAAGGGTG AGAGCCCTGC 1020 

CCAGGCAAAG CAGAGCTATG CCAAGTTTGC 1080 

GGCTCATCCG CGTGCACGCA GACGGGTCGT 1140 

CTCTCCTAAG TGTAGGACAC TCTCGGGAGT 1200 

CCCCCTTTTT TCTTTTTTAA AAGAAGCACG 1260 

TTCGCAAAGC ACTGGAAAAT AACGGGGAAA 1320 

CCAACTCCTA TAAAAATGAA ATATTGTGTT 1380 

ATTGGCNAAG GANTGANCCA ACCCCTGAGG 1440 

CTAATCCAAT TGGTAACCCC GTTTNTTTTT 1500 

GAGCCCTGGT TCTTTCCCAG CTCTCAGAAG 1560 

AAATCATAAT TTGCATGACA CCTAGTGGAC 1620 

CCCAACTCTA AAACTACTTC TTCTAAAGCA 1680 

ATTCTTTGTT CAGATNAGGC TAATGTAACA 1740 

GTTTGCATCT CCTTAACTAA GGCAGTAGTA 1800 

AAAGCAGATA TGCCCAGAAT AATGGCAGCG 1860 

GCATACCTAA AGGTTTGAAT GGCAGAATCA 1920 

GGTTACAACA CATATGGCGG CCCCTTGAAT 1980 

TCTAACCATT ATTCCTCCTN CCNAAAAACG 2040 

CCCANACCGA GGTACCCCAT AATGNGGGGG 2100 

AACGGCAGAG NGGGATCNTT TATGTTCNGG 2160 

GAGGCTGCAA ATTAAGGGAA ACATTNTGTA 2 220 

GGGGATCTTT AGGGAGAATT TTCCCAGGAA 2 280 

ACCATCGATG AACNTCTTTA TTGACAGGGG " r 2 340 

TTTTAATCTG ATCTGACTGA TCTACACTAG 2400 

CAAGGGCCCC TTTTGGAGAA AAACTCACCC 2460 

GTGGAGGAGC AGACTCGTCT CCCTCCCAGA 2 520 

GCAGACTCGT CTCCCTCCCA GAAGGCGTCC 2 580 

2598 
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Claims 

1 1. A composition comprising an oligonucleotide primer 

2 which may be used to detect the presence of a 

3 nucleic acid molecule which (i) hybridizes to the 

4 env gene of a mouse mammary tumor virus; (ii) is 

5 present in at least 38 percent of DNA samples 

6 prepared from breast cancer tissue of different 

7 human subjects; and (iii) hybridizes to less than 

8 7 percent of DNA samples prepared from tissues 

9 other than breast cancer tissue from different 
10 human subjects, 

1 2. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTC ACTG C C AGATC (SEQ ID NO:l). 

1 3. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 GGG AATTCCTCACTGCC AGATC ( SEQ ID NO : 2 ) . 

1 4. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTCACTGCCAGATCGCCT (SEQ ID NO: 3). 

1 5. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 T AC ATCTG C CTGTGTT AC (SEQ ID NO: 4). 

1 6. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTACATCTGCCTGTGTTAC (SEQ ID NO: 5). 

1 7. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCGCCATACGTGCTG (SEQ ID NO: 6). 
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1 8. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 ATCTGTGGCATACCT (SEQ ID NO:7). 

1 9. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 GGGAATTC ATCTGTGGCATACCT (SEQ ID NO : 8 ) . 

1 10 . The composition of claim 1, wherein the 

2 oligonucleotide primer comprises a sequence 

3 selected from the group consisting of 

4 ATCTGTGGCATACCT AAAGG (SEQ ID NO : 9 ) ; 

5 GAATCGCTTGGCTCG (SEQ ID NO:10); 

6 CCAGATCGCCTTTAAGAAGG (SEQ ID NO: 11); and 

7 TACAGGTAGCAGCACGTATG (SEQ ID NO: 12). 

1 11. An essentially purified peptide encoded by a 

2 nucleic acid molecule which (i) hybridizes to 

3 a gene of MMTV; (ii) is present in at least 

4 20 percent of DNA samples prepared from breast 

5 cancer tissue of different human subjects; and 

6 (iii) is present in less than 5 percent of DNA 

7 samples prepared from tissues other than breast 

8 cancer tissue from different human subjects. 

1 12. An antibody which specifically binds to the 

2 peptide of claim 11. 

1 13. The peptide according to claim 11 which comprises 

2 the amino acid sequence LKRPGFQEHEMI (SEQ ID 

3 NO: 13). 

1 14. An antibody which specifically binds to the 

2 peptide of claim 13. 

1 15. The peptide according to claim 11 which comprises 

2 the amino acid sequence GLPHLIDIEKRG (SEQ ID NO: 14). 
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1 16 . A method of diagnosing breast cancer in a human 

2 subject, comprising detecting the presence of a 

3 peptide encoded by a nucleic acid molecule which 

4 (i) hybridizes to the env gene of 3' LTR of a 

5 mouse mammary tumor virus; (ii) is present in at 

6 least 20 percent of DNA samples prepared from 

7 breast cancer tissue of different human subjects; 

8 and ( iii) is present in less than 5 percent of DNA 

9 samples prepared from tissues other than breast 
10 cancer tissue from different human subjects . 

1 17. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence LKRPG FQEH EM I (SEQ ID NO: 13) is detected 

4 by the binding of an antibody specific to the 

5 peptide. 

1 18. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence GLPHLI D I EKRG (SEQ ID NO: 14) is detected 

4 by the binding of an antibody specific to the 

5 peptide. 

1 19. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence TNCLDSSAYDTA (SEQ ID NO: 15) is detected 

4 by the binding of an antibody specific to the 

5 peptide. 

1 20. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence DIGDEPWFDD (SEQ ID NO: 16) is detected by 

4 the binding of an antibody specific to the 

5 peptide. 

1 21. A composition comprising an oligonucleotide primer 

2 which may be used to detect the presence of a 
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3 nucleic acid molecule which (i) hybridizes to a 

4 nucleic acid comprised of a sequence selected from 

5 the group consisting of the env gene and the 3 ' 

6 LTR of a mouse mammary tumor virus; (ii) is 

7 present in a substantial percentage of DNA samples 

8 prepared from breast cancer tissue of different 

9 human subjects; and (iii) hybridizes to less than 

10 5 percent of DNA samples prepared from tissues 

11 other than breast cancer tissue from different 

12 human subjects. 

1 22. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 C C AG AT CG C CTTT AAG AAG G (SEQ ID NO: 11). 

1 23. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CGAACAGACACAAACACACG (SEQ ID NO: 19). 
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GTTGACTGTGACCCACCTATCGAMTTTAMTCGTATr^rriY^AT^ooT 

TCCTATTTTGACAAGCGTACTCCTCTncCCCTT^ 

CCCTTAGTCCTTGCACCTGTTCTTCMCTGAGGTTGAGCGTCTC 
TCTATTCCCATTTCTMCCTTTGMTTTGAGTAM^ 

GGTAAAMTATMTTTTTAGCAAGCATTCTTATTTCTATTTCTGA^rAr^ 
GTC^TGTGGCTTGTMNAGGAANTTGGCTGTGGTCC^ 
AGGTCGAGTTCTCCGMTTGTTTAGATTGTAATCTTGCAC^AGA^Ar 
AAAGMTCAAGGGTGAGAGCCCTGCGAGCACGAA^GCMCTTC^ 

agccccaggcaaagcagagctatgccmgtt^agcaS 

TCTTTGTCTGATGGGCTCATCCGCGTGCACG^GACGGG^CTC^C^^TiT 
GGAMCMCCCCTTGGCTGCTTCTCTCCTMOT^ 

TTCMCCATTTCTGCTGCAGGCGCGGCATTTCCCCCTTTT^rTTT^ 
TTCAMTGCCCMTAMTGAGCCCTGGTTrTTTrrrArrTrTr*^* 
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Description 

Detection Of Mammary Tumor Virus-Like 
Sequences In Human Breast Cancer 

Cross-Reference to Related Application 

This application is a continuation-in-part 
application of U.S. Serial No. 08/555,394, filed 
November 9, 19 95. 

Statement Regarding Federally Sponsored Research 

This invention was made with funds from the U.S. 
government, which has certain rights in the invention. 

I ntr oduct ion 

The present invention relates to materials and 
methods for diagnosing breast cancer in humans. It is 
based, at least in part, on the discovery that a 
substantial percentage of human breast cancer tissue 
samples contained nucleic acid sequences corresponding 
to a portion of the mouse mammary tumor virus env gene. 
In contrast, such sequences were absent in almost all 
other human tissues tested. 

Background of the Invention 

A large body of information has accumulated about 
the molecular biology of MMTV (reviewed in Slagle, 
B.L. et al., 1987, in "Cellular and Molecular Biology 
of Mammary Cancer", Kidwell et al., eds., Plenum Press, 
NY. pp 275-306) . Mouse mammary tumor virus (MMTV) is 
associated with a high incidence of breast cancer in 
certain strains of mice (over 90% among females) , and 
has been regarded as a potential model for human 
disease. 

The MMTV virus does not carry a transforming 
oncogene, but rather acts as an insert ional mutagen 
with several proviral insertion loci designated int-1 
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or wnt-i (Nusse R. et al. , 1982, Cell 31:99-109) int-2 
(Peters, G. et al., 1983, Cell 33:369-377) int-3 
(Gallahan, D. et al., 1987, J. Virol. 61:218-220) int-4 
(Roelink, H. et al., 1990, Proc. Natl. acad. Sci. USA 
5 87:4519-4523) and int-5 (Morris, V.L., et al. 1991, 
Oncogene Research 6:53-63), which encode for growth 
factors or other related proteins. These genes are not 
expressed in normal mammary tissue but become activated 
after integration of MMTV provirus into the adjacent 
10 chromosomal DNA. 

The human hontolog of the int-2 locus has been 
located on chromosome 11 (Casey, G. et al., 1986, 
Mol. Cell Biol. 6:502-510) and has been found amplified 
(xn 15% of the breast cancers) and also expressed 
15 (Lidereau, R. et al., 1988, Oncogene Res 2:285-291; 

Zhou, D.J. et al., 1988, Oncogene 2:279-282; Liscia, 
D.S. et al., 1989, Oncogene 4:1219-1224; Meyers, s.L. 
et al., 1990, Cancer Res 50:5911-5918). it may be 
significant that in tumors from Parsi women, who have a 
high incidence of breast tumors, the int-2 locus is 
amplified in 50% of the cases (Barnabas-Sohi, N. et 
al., 1993, Breast Dis. 6:13-26). The amplification of 
int-2 and other genes in llqi3 is indicative of poor 
prognosis (Schuwring, E. et al., 1992, Cancer Research 
52:5229-5234; Champeme, M-H, et al., 1995, Genes, 
Chromosomes and Cancer 12:128-133). Both mouse and 
human int-2 have been sequenced (Moore, R. et al., 
1986, EMBO J 5:919-924). The gene encodes a protein of 
about 27 kilodaltons (KD) which shows homology to both 
basic and acidic fibroblast growth factors (Dickson, c 
et al. 1987, Nature (London) 326:833). 

However, efforts to demonstrate the presence of 
viruses in human breast cancer through search for viral 
particles, immunological cross-reactivity, or sequence 
homology have yielded contradictory results. Detect- 
able MMTV env gene-related antigenic reactivity 
has been found in tissue sections of breast cancer 
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(Mesa-Tejada et al., 1978, Proc. Natl, Acad. Sci. USA 
75:1529-1533; Levine , P. et al., 1980, Proc. Am. Assoc. 
Cancer Res. 21:170; Lloyd, R. et al., 1983, Cancer 
51:654-661), breast cancer cells in culture (Litvinov, 
5 S.V. and Golovkina, T.V. , 1989, Acta Virologica 33:137- 
142), human milk (Zotter S. et al., 1980, Eur. J. 
Cancer 16:455-467) in sera of patients (Day, N.K. 
et al., 1981, Proc. Natl. Acad. Sci. USA 78:2483-2487), 
in cyst fluid (Witkin, S.S. et al., 1981, J. Clin. 

10 Invest. 67:216-222) and in particles produced by a 

human breast carcinoma cell line (Keydar, I. et al., 
1984, Proc. Natl. Acad. Sci. USA 81:4188-4192). 
Sequence homology to MMTV has been found in human DNA 
under low stringency conditions of hybridization 

15 (Callahan, R. et al., 1982, Proc. Natl. Acad. Sci. USA 
79:5503-5507) and RNA related to MMTV has been detected 
in human breast cancer cells (Axel, R. et al., 1972, 
Nature 235:32-36). The presence of MMTV related 
sequences in lymphocytes from patients with breast 

20 cancer has been reported (Crepin, M. et al., 1984, 

Biochem. Biophys. Res. Comm. 118:324-331), as well as 
detection of reverse transcriptase (RT) activity in 
their monocytes ( Al-Sumidaie, A.M. et al., 1988, Lancet 
1:5-8). May and Westley (May and Westley, 1989, Cancer 

25 Research 49:3879-3883) have reported the presence of 

MMTV-like sequences arranged as tandem repeats only in 
DNA from breast cancer cells. 

These results have been difficult to interpret, 
and theories linking MMTV or a related virus with human 

30 breast cancer have fallen out of favor, in view of the 
relatively recent discovery of human endogenous 
retroviral sequences ("HERs"; Westley, B. et al., 1986, 
J. Virol. 60:743-749; Ono, M. et al., 1986, J. Virol. 
60:589-598; Faff, 0. et al., 1992, J. Gen. Virology 

35 73:1087-1097). Data which could be interpreted to 
demonstrate the presence of MMTV-related sequences 
could be more readily explained by endogenous human 
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retroviral sequences. Adding further confusion to the 
picture, env-gene related antigenicity has been 
detected in epitopes of human proteins (Hareuveni, M. 
et al., 1990, Int. J. Cancer 46:1134-1135). 

5 Brief Summary of the Invention 

The present invention relates to methods for 
diagnosing breast cancer in humans in which the 
presence of mouse mammary tumor virus env gene-like 
sequences bears a positive correlation to the existence 

10 of malignant breast disease. It is based, at least in 
part, on the discovery that 38 to 4 0 percent of human 
breast cancer tissue samples tested contained gene 
sequences homologous to the mouse mammary tumor virus 
env gene that are substantially absent from other human 

15 tumors and tissues. The invention also relates to 

methods for diagnosing breast caner in humans in which 
the presence of retrovirus proviral fragments substan- 
tially homologous to the env gene and/or 3 ' LTR 
sequence of MMTV are detected. The molecular probes 

20 used in these experiments were designed to avoid cross- 
hybridization with endogenous human retroviral 
sequences. The present invention further provides for 
compositions of molecular probes which may be utilized 
in such diagnostic methods. 

2 5 Brief Description of the Figures 

FIGURE 1 ; Amplification of 660 bp of MMTV-like 
env gene. DNA was extracted from frozen tissues. PCR 
was performed using primers 1 and 3. A: 2% agarose 
gel electrophoresis. B: Southern blot hybridization 

30 using 5 ' 32 P-end-labeled probe 2. Lanes 1 and 3: breast 
cancer; lanes 2 and 4: normal breast; lane 5: control 
reaction (no DNA) ; lane E: MMTV env gene. M: molecular 
weight marker. Arrow indicates 510 bp band. 

FIGURE 2 : Nested PCR. A: 2% agarose gel electro- 

35 phoresis. 1: Amplification of 686 bp of MMTV-like env 
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gene sequences using primers 1 and 4 and the product of 
reaction A 1 as template. 2: Amplification of 250 bp 
of MMTV-like env gene sequences using primers 2 and 3. 
B, 1 and 2: Southern blot hybridization of the ampli- 
5 fied products using probe 5'- 32 P end-labeled probe 2a. 

FIGURE 3 : Amplification of 250 bp of MMTV-like 
env gene. DNA was extracted from paraffin-embedded 
tissue sections. PCR was performed using primers 2 and 
3. A: 2% agarose gel electrophoresis. B: Southern blot 

10 hybridization using 5 ' " 32 P-labeled probe 2a. Lane 1: 
normal breast; lanes 2 to 5: breast cancer; lane E: 
MMTV env gene. M: molecular weight marker. Arrow 
indicates 298 bp band. 

FIGURE 4 : Nucleotide sequence of the cloned MMTV 

15 env gene-like sequences as compared to the env 

sequences of the GR and BR6 strains of MMTV using the 
GCG program. *:potential glycosylat ion site, | : mismatch 
to MMTV. 

FIGURE 5 : Southern blot hybridization of genomic 
20 DNA. DNA was extracted from frozen tissues or cell 
lines, digested with EcoRl and transferred to 
nitrocellulose paper. Hybridization with 32 P-labeled 
clone 166. DNA from A, B, and G: env gene positive - 
breast cancer; C and D: env negative breast cancer; 
25 E and F: normal breast; H:MCF-7 cells. M: molecular 
weight marker, Arrow indicates 9kb band. 

FIGURE 6 ; Southern blot hybridization of genomic 
DNA. Experimental conditions as in Fig. 5. DNA from 
A and B: env negative breast cancer; C and D: env 
3 0 positive breast cancer; E: molecular weight marker 

(non-labelled); F. to H: normal breast. Arrow indicates 
position of 9 kb marker. 

FIGURE 7 ; Map of MMTV. 

FIGURE 8 : Comparison of the nucleic acid sequence 
35 of mouse mammary tumor env gene ("MMTENV") , showing 

residues 976-1640, with the nucleic acid sequence of a 
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representative 660 bp sequence obtained by PGR reaction 
of DNA from human breast cancer tissue ( "MS1627" ) . 

FIGURE 9 : Sequence of an about 2 . 6 kb MMTV-like 
fragment detected in a human breast carcinoma. 

5 Detailed Description of the Invention 

The present invention relates to methods and 
compositions for diagnosing breast cancer in humans. 

The present invention provides for compositions 
comprising an isolated and purified nucleic acid 

10 molecule which (i) hybridizes to a gene of mouse 
mammary tumor virus; (ii) is present in at least 
2 0 percent of DNA samples prepared from breast cancer 
tissue of different human subjects; and (iii) is 
present in less than 5 percent of DNA samples prepared 

15 from tissues other than breast cancer tissue from 

different human subjects. A "gene of mouse mammary 
tumor virus" includes, but is not limited to, the gag , 
pol , and env genes and the 5' LTR and 3' LTR sequences 
of MMTV ♦ In preferred embodiments of the invention, 

20 the mouse mammary tumor virus (hereafter "MMTV") gene 
is the env gene and/or the 3' LTR sequence. The term 
"hybridize" is used to refer to routine DNA-DNA or DNA- 
RNA hybridization techniques under what would be 
regarded, by the skilled artisan, as stringent 

25 hybridization conditions. The phrase "is present" 

indicates that a native form of the molecule, in an 
unpurified state (for example, as part of chromosomal 
DNA) , may be detected by a standard laboratory 
technique, such as Southern blot or polymerase chain 

30 reaction (PGR). To be "present", the molecule may be 
detectable by one technique but not others. To be 
present in "less than 5 percent of DNA samples prepared 
from tissues other than breast cancer tissue from 
different human subjects", aj.1 non-breast cancer tissue 
3 5 samples are considered together, but the total number 
of samples must be large enough to give the 5 percent 
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value statistical significance that would be reasonable 
to the skilled artisan. 

In order to identify such a nucleic acid molecule, 
the sequence of MMTV may be compared, using a computer 
5 database, to known human DNA sequences, and portions 
of MMTV which are less than or equal to 2 5 percent 
homologous to a human sequence may be selected for 
further study. The term "homologous", as used herein, 
refers to the presence of identical residues; for 

10 example, a first sequence is considered 25 percent 

homologous to a second sequence if it shares 25 percent 
of the residues of the first sequence. Since there is 
relatively greater likelihood that MMTV may bear 
similarity to human retroviral-like sequences, it may 

15 be preferable to evaluate whether a particular MMTV 

nucleic acid sequence is homologous to such sequences, 
for example, as endogenous human retrovirus sequences. 
A prototype of such viruses is HERV-K10 (Ono, M. 
et al. , 1986, J. Virol. 60:589-598). 

20 Once an MMTV gene sequence which is less than or 

equal to 25 percent homologous to a human DNA sequence, 
such as a human endogenous retroviral sequence, is 
identified, the presence of nucleic acid molecules 
having the MMTV gene sequence in human breast cancer 

25 tissues and other tissues may be evaluated. Such 

evaluations may be performed either by Southern blot 
techniques, or, preferably, by polymerase chain 
reaction (PCR) techniques, which are more sensitive. 
In such a way, MMTV gene sequences which (i) hybridize 

30 to at least 20 percent of DNA samples prepared from 
breast cancer tissue of different human subjects and 
(ii) hybridize to less than 5 percent of DNA samples 
prepared from human tissues other than breast cancer 
tissues may be identified. A nucleic acid molecule 

3 5 having a MMTV gene sequence which satisfies these 

requirements may then be used in diagnostic methods 
which detect the presence of such sequence in human 



BNSDOCID: <WO 971 7470A1 > 



WO 97/17470 

. PCT/US96/I7877 



10 



15 



20 



25 



30 



35 



breast tissue by standard techniques, including PGR 
techniques which assay for the presence of the 
molecule, but also, where appropriate, Southern blot 
Northern blot, or Western blot techniques, to name but 
a few. 



in preferred embodiments, the present invention 
relates to a portion of MMTV localized between MMTV env 
gene sequences 976 and 1640 (Majors, i.e. and Varmus 
H.E., 1983, J. Virol. 47:495-504; see Fig. 7). This ' 
about 660 bp sequence (hereafter, "the 660 bp 
sequence") has been found to exhibit low (16 percent) 
homology to the prototype human endogenous retrovirus 
HERV-K10, using the IBI/Pustell Sequence Analysis 
Program, and has also been shown to be present in 121 
(38.5%) of 314 unselected breast cancer tissue samples 
m cultured breast cancer cells, in 2 of 29 breast 
fibroadenomas (6.9%) and in 2 of 107 breast specimens 
from reduction mammoplasties (1.8%). The sequence was 
not found in normal tissues including breast, lympho- 
cytes from breast cancer patients nor in other human 
cancers or cell l ines ( see example section, infra). 
Similarly, an about 250 bp sequence (hereafter "the 
250 bp sequence"), between positions 1388 and 1640 in 
the env gene, and therefore falling within the 660 bp 
sequence, was detected in 60 (39.7%) of 151 breast 
cancer, and in one of 27 normal breast samples assayed 
from paraffin-embedded sections. Cloning and sequenc- 
ing of the 660 bp and 250 bp sequences demonstrated 
that they are 95-99% homologous to MMTV env gene, but 
not to the known human endogenous retroviruses ( "HERs") 
nor to other viral or human genes (<18%) . 

In another preferred embodiment, the present 
invention relates to a a nucleic acid molecule which 
corresponds to a retroviral genomic fragment which has 
substantial homology to 3' LTR and/or env gene of the 
MMTV genome, and is found in a substantial percentage 
of breast cancer samples. By substantial percentage is 
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meant at least 20% of tested breast cancer samples* 
Such a sequence is preferably comprised of the 3' LTR 
region and all or part of the env gene, although it may 
include more sequences of a retroviral genome. Most 
5 preferably, the sequence is at least comprised of an 
about 2.6 kb fragment which comprises the 1,228 base 
pair (bp) sequence of the 3' LTR sequence and 1,3 36 bp 
of the env gene sequence of MMTV (Fig. 9) (SEQ ID 
NO: 20) . When compared with the two strains of MMTV C3H 

10 and BR6, the sequence homology was 90.8% and 90.7%, 
respectively. When compared with the endogenous 
retroviral sequences (HUMERKA) , sequence homology was 
only 58% in 36 bp and 71% in 74 bp. 

Retrovirus proviral sequences can be detected by 

15 PCR technology using primers derived from the MMTV 

genome. Such primers include primer 5L, containing the 
nucleotides 7376-7395 of the MMTV BR6 genome (5' -3': 
CCAGATCGCCTTTAAGAAGG) (SEQ ID NO: 11) and primer LTR 3 , 
containing nucleotides 9918-9927 of the MMTV BR6 genome 

20 (5'-3': CGAACAGACACAAAGCGACG) (SEQ ID NO:19). Other 
primers which correspond to or are homologous to MMTV 
sequences can be used as primers. Nucleotide fragments 
which correspond to or are homologous to the retroviral 
sequences isolated from the breast cancer samples can 

25 also be used to amplify additional retroviral fragments 
from the samples. Long PCR techniques can be used to 
amplify longer stretches of a proviral sequence. 

The present invention provides for compositions 
comprising an isolated and purified nucleic acid 

30 molecule which hybridizes to the about 2 . 6 kb 

retroviral fragment shown in Fig. 9 under stringent 
conditions or is at least 90 percent homologous to said 
fragment using the MacVector homology determining 
program which may be used to diagnose breast cancer in 
35 a subject, using methods which include PCR and Southern 
blot methods. 
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Nucleic acids having the 660 bp sequence, the 
250 bp sequence, or all or part of the about 2.6 kb 
sequence, may therefore be used, according to the 
invention, to diagnose breast cancer in a subject, 
using methods which include PCR and Southern blot 
methods. Where PCR methods are used, primers such as 
those listed in Table i, below, may be utilized. 

The present invention provides for compositions 
comprising essentially purified and isolated nucleic 
acid having the 660 bp sequence or the 250 bp sequence 
or an at least five bp, and preferably greater than or 
equal to ten bp, subsequence thereof. in order to 
maintain the desired specificity, such nucleic acid 
molecules may preferably contain sequence falling 
15 within the 660 bp sequence, but preferably do not 
contain sequences from other portions of the MMTV 
genome, which may, undesirably, hybridize to human 
sequences which are not breast cancer specific, such as 
HERs. Accordingly, the present invention provides for 
20 compositions wherein the isolated and purified nucleic 
acid molecule comprises at least a portion having a 
nucleic acid sequence which hybridizes to a region of 
the mouse mammary tumor virus env gene between residues 
976 and 1640, or between residues 1388 and 1640, and 
25 wherein the isolated and purified nucleic acid molecule 
does not hybridize to any other region of the MMTV 
genome . 

The 660 bp sequence, in various embodiments, may 
have a number of nucleotide sequences. For example, in 
one embodiment, the 660 bp sequence may have a sequence 
as set forth in Fig. 8 and designated "MMTENV-like 
sequence" (SEQ ID NO: 17), which depicts the MMTV env 
sequence between residues 976 and 1640. In a second 
series of embodiments, the 660 bp sequence may have a 
sequence as set forth in Fig. 8 and designated "MS1627" 
(SEQ ID NO: 18), which depicts a predominant sequence 
for the 660 bp sequence as it has been defined by 
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sequencing analysis of the products of PGR reactions 
using DNA from human breast cancer tissues. In still 
further embodiments; the 660 bp sequence may have 
various other nucleotide sequences obtained by 
5 sequencing the results of PCR reactions to detect the 
presence of 660 bp sequence in human breast cancer 
tissues. 

In related embodiments, the present invention 
provides for compositions comprising PCR primers 
10 that may be used to detect the presence of the 

f orementioned molecules or other MMTV-like sequences. 
For example, the compositions may comprise one or more 
of the following primer molecules (5' - 3'): 
CCTCACTGCCAGATC (SEQ ID NO:l); GGGAATTCCTCACTGCCAGATC 
15 (SEQ ID NO: 2); CCTCACTGCCAGATCGCCT (SEQ ID NO: 3); 

TACATCTGCCTGTGTTAC (SEQ ID NO: 4); CCTACATCTGCCTGTGTTAC 
(SEQ ID NO: 5); CCGCCATACGTGCTG (SEQ ID NO: 6); 
ATCTGTGGCATACCT (SEQ ID NO: 7); GGGAATTCATCTGTGGCATACCT 
(SEQ ID N0:8); ATCTGTGGCATACCTAAAGG (SEQ ID NO:9); 
20 GAATCGCTTGGCTCG (SEQ ID NO: 10); CCAGATCGCCTTTAAGAAGG 
(SEQ ID NO: 11); TACAGGTAGCAGCACGTATG (SEQ ID NO: 12); 
CGAACAGACACAAACACACG ( SEQ ID NO : 19 ) . 

The use of such compositions and molecules in PCR 
and Southern blot techniques is illustrated in the non- 
25 limiting examples set forth below. The correlation 

between the presence of the MMTV-related nucleic acid 
molecules described above and breast cancer allows such 
molecules and compositions to be utilized in the 
diagnosis of breast cancer. Accordingly, the present 
30 invention provides for a method of diagnosing breast 
cancer, wherein the detection of such nucleic acid 
molecules bears a positive correlation to the existence 
of breast cancer in a human. The results of such 
evaluation, together with additional clinical symptoms, 
3 5 signs, and laboratory test values, may be used to 
formulate the complete diagnosis of the patient. 
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In further related embodiments, the present 
invention provides for an essentially purified peptide 
encoded by a nucleic acid molecule which (i) hybridizes 
to a gene of MMTV; (iij i s present in at least 
5 2 0 percent of DNA samples prepared from breast cancer 
tissue of different human subjects; and (iii) i s 
present in less than 5 percent of DNA samples prepared 
from tissues other than breast cancer tissue from 
different human subjects, m preferred embodiments, the 
10 MMTV gene is the env gene. 

Such peptides may be used in the diagnosis of 
breast cancer. Accordingly, the present invention 
provides for a method of diagnosing breast cancer in 
a human subject, comprising detecting the presence of 
a peptide encoded by a nucleic acid molecule which 
(i) hybridizes to the eny gene of a mouse mammary tumor 
virus; (ii) is present in at least 20 percent of DNA 
samples prepared from breast cancer tissue of different 
human subjects; and (iii) i s present in less than 
5 percent of DNA samples prepared from tissues other 
than breast cancer tissue from different human 
subjects. 

The present invention also provides for antibodies 
(including monoclonal and polyclonal) antibodies which 
specifically bind to such peptides. Such antibodies may 
be used in methods of diagnosing breast cancer, for 
example, but not by way of limitation, by Western blot, 
immunof luorescent techniques, and so forth. 

In nonlimiting embodiments of the invention, the 
skilled artisan may evaluate MMTV-like nucleic acid 
molecules for regions which would be considered likely 
to encode immunogenic peptides (using, for example, 
hydropathy plots) . Such peptides may then be sequenced 
and used to produce antibodies that may be employed in 
35 diagnostic methods as set forth above. 

For example, certain peptides encoded by portions 
of the 660 bp sequence have been synthesized. These 
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peptides, which have the sequences LKRPGFQEHEMI (SEQ ID 
NO:13) and GLPHLI D I EKRG (SEQ ID NO:14), have been used 
to produce antibodies in rabbits, and the resulting 
antisera have successfully identified breast cancer 
5 cells positive for MMTV env-like sequences by PGR 

assay. Other peptides encoded by 660 bp sequence which 
may be useful according to the invention include 
TNCLDSSAYDTA (SEQ ID NO: 15) and DIGDEPWFDD (SEQ ID 
NO: 16) . 

10 6 - Example: The Detection of Mouse Mammary Tumor 

Virus Env Gene-Like Sequences in Human Breast 
Cancer Cells and Tissues 

6.1. Materials and Methods 
DNA from breast cancer tissue and other human 
15 cancer tissues, human placentas, normal human tissues 
including breast, and from several human cell lines 
(including eight breast cancer cell lines) , and two 
normal breast cell lines was extracted following the 
procedure of Delli Bovi et al. (198 6, Cancer Res. 
20 46:6333-6338). The DNA was resuspended in a solution 
containing 0.05 M Tris HC1 buffer, pH 7.8, and 0 . 1 mM 
EDTA, and the amount of DNA recovered was determined by 
microfluorometry using Hoechst 3 3258 dye (Cesarone, C. 
et al., 1979, Anal Biochem 100:188-197). Plasmids 
25 containing the cloned genes of MMTV were obtained from 
the ATCC, propagated in Escherichia coli cultures and 
purified using anion-exchange minicolumns (Qiagen) or 
by precipitation with polyethylene glycol (Sambrook J., 
et al., 1989, in "Molecular Cloning/A Laboratory 
30 Manual", Cold Spring Harbor). Oligonucleotide primers 
were synthesized at the core facilities of the 
Brookdale Molecular Biology Center at Mount Sinai 
School of Medicine. 

Polymerase chain reaction (PCR) was performed 
35 using Taq polymerase following the conditions 

recommended by the manufacturer (Perkin Elmer Cetus) 
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with regard to buffer, Mg 2+ and nucleotide concentra- 
tions. Thermocycling was performed in a DNA cycler 
by denaturation at 94° C for 3 min. followed by either 
35 or 50 cycles of 94°C for 1.5 min., 50° c for 2 min. 
5 and 72 °C for 3 min. The ability of the PCR to amplify 
the selected regions of the MMTV env gene was tested by 
using as positive templates the cloned MMTV env gene 
and the genomic DNA of the MCF-7 cell line, since it 
was shown to express gp52 immunological determinants 
10 (Yang, N.S., et al., 1975, J. Natl. Cancer Inst. 

61:1205-1208). Optimal Mg 2+ , primer concentrations and 
requirements for the different cycling temperatures 
were determined with these templates. The master mix 
as recommended by the manufacturer was used. To detect 
possible contamination of the master mix components, a 
reaction without template was routinely tested. y DNA 
and control primers provided by the manufacturer were 
used as control for polymerase activity. As an 
internal control, amplification of a 120 bp sequence 
estrogen receptor gene was assayed using primers 
designed and generously provided by Dr. Beth Schachter, 
(Mount Sinai School of Medicine, N.Y.). m addition, 
primers for actin 5 gene amplification were also used. 
The product of the PCR was analyzed by electro- 
25 phoresis in a 2% agarose gel. A l kb DNA ladder (Gibco 
BRL) was used to identify the size of the PCR product. 
To determine if the amplified sequences of the middle 
region of the 660 bp faithfully reproduced the 
sequences of the env gene of MMTV, an 18-mer sequence 
within the env gene was used as a probe for the 660 bp 
amplified sequence. The 18-mer probe was 5' end- 
labeled with 32 P-ATP using T4 polynucleotide kinase and 
purified by the NENSORB nucleic acid purification 
cartridge (NEN) . Southern blot hybridization was 
35 performed using the conditions described by (Saiki 
et al.,1985, Science 230:1350-1354). 
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The product of the PCR (660 bp or 250 bp) was 
cloned directly from the reaction mixture into the TA 
cloning vector (Invitrogen) using the TA cloning kit 
and following the conditions recommended by the 
5 supplier. Direct cloning of the fragment isolated from 
the gel, was also performed.- Plasmid DNA was purified 
by CsCl density gradient centrif ugation or by 
precipitation with polyethylene glycol (Sambrook 
et al., 1989, in "Molecular Cloning/ A Laboratory 

10 Manual", Cold Spring Harbor), restricted with Hindlll 
and EcoRl, electrophoresed in 2% agarose gels and 
transferred to nitrocellulose filters. Southern blot 
hybridization was carried out using a 5 '-terminal 
labeled internal probe as described above. Cloning 

15 procedures were performed in laboratories totally 

separate from those where PCR was carried out. Auto- 
mated DNA sequencing (using Applied Technology 
Sequencer Model 373A) was performed in the Brookdale 
Molecular Biology Center. Sequence homology was 

20 determined using the IBI MacVector GenBank and GCG 
Programs. 

To prevent contamination of the samples, process- 
ing of human tissues was performed in a laminar flow , 
hood. DNA extractions were done in a chemical hood 

25 located in a different room from that where PCR was 

performed. PCR assays were assembled in a biological 
hood provided with ultraviolet light. Aerosol 
resistant tips and dedicated positive-displacement 
pipettes were used throughout. All equipment used for 

10 PCR (microcentrifuge, electrophoresis apparatus, 
pipettors) was cleaned each time with 10% sodium 
hypochlorite to assure DNA decontamination (Prince and 
Andrus, 1992, Biotechniques 12:358-36). After the 
initial experiments were performed, the plasmid con- 

(5 taining the MMTV env gene was frozen and never used 
again, to avoid contamination. However, to detect 
plasmid contamination from our own env gene clones, 
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primers were designed to amplify plasmid sequences. 
All the authentic MMTV env positive samples were then 
tested and found negative for plasmid contamination. 

Southern blotting and hybridization were performed 
5 as described (Southern, E.M. , 1975, J. Mol. Biol. 

98:503-517), using the 660 bp cloned sequences labeled 
by the random primer procedure (Feinberg, A. P., et al., 
1983, Anal. Biochem. 132:6-13). Prehybridization and 
hybridization were performed in a solution containing 

10 6 x SSPE, 5% Denhardt's, 0.5% SDS, 50% formamide, 

100 jug/ml denaturated salmon testis DNA, incubated for 
18 hrs at 42 °C, followed by washings with 2 x SSC and 
0.5% SDS at room temperature and at 37 °C and finally in 
0.1 x SSC with 0.5% SDS at 68 °C for 3 0 min (Sambrook 

15 et al. , 1989, in "Molecular Cloning/A Laboratory 

Manual", Cold Spring Harbor). For paraffin-embedded 
tissue sections the conditions described by Wright and 
Marios (1990, in "PCR Protocols", Innis et al-, eds. , 
Academic Press, pp. 153-158) were followed using 

20 primers designed to detect a 250 bp sequence. 

6.2. Results 

6.2.1. Selection of Specific MMTV Env Gene Sequences 
A computer search for MMTV env gene homologous 
sequences was first performed, since sequence homology 

2 5 between the human endogenous retroviral sequences and 

MMTV had been described. The prototype of this group 
of human endogenous retroviruses is HERV-K10 (Ono, M. 
et al., 1986, J. Virol. 60:589-598). The sequences of 
the env gene of MMTV (Majors, I.E. and Various, H.E., 
30 1983, J Virol 47:495-504) were aligned with sequences 
of the env gene of the human endogenous retrovirus 
HERV-K10 (Ono, M. et al., 1986, J. Virol. 60:589-598), 
using the IBI/Pustell Sequence Analysis Program. A 
region of 660 bp of low homology (16%) was localized 

3 5 between MMTV env gene sequences 97 6 and 164 0 (Majors, 

I.E. and Varmus, H.E., 1983, J Virol 47:495-504). This 
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internal domain of the outer membrane of the env gene 
has only one glycosylation site and is highly conserved 
between strains. Two primers comprising 15 bp 
sequences at positions 976-990 (primer 1) and 1626-1640 
5 (primer 3) were first synthesized. Later longer 
primers were synthesized (IN and 3N) . An 18-mer 
sequence in the middle of the 660 bp MMTV env region 
(1388-1405) (primer 2) was used as a probe to identify 
the 660 bp sequence. A second oligomer probe was 
10 synthesized comprising the sequence 1554 to 1568 
(primer 2a) to be used for hybridization when a 
sequence of around 250 bp (between positions 1388 and 
1640) was amplified. For nested PCR reactions (Mullis, 
K.B. and Faloona, F.A. , 1987, Meth Enzymol 155:335- 
15 350) , another primer comprising sequences 1647 to 1661 
(primer 4) was synthesized to be used with primer 1 in 
the first reaction and primers 2 and 3 in the second. 
Modified primers with GC clamps and extra sequences 
were also synthesized and used in the PCR (primers la 
20 and 3a) . Another set of primers comprising sequences 
974 to 1003 (5L) and 1558 to 1577 (3L) were subse- 
quently developed because their Tin's matched and 
provided better amplification than the original 
primers. The sequences are represented in Table 1. 
25 All of them were productive in amplification reactions. 
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Table i. Prim r and prob sequ nc s and 1 cation 
in mouse mammary tumor virus env g n 



Designation 


Sequence (5 '-3') 


TifiPA ^ "i fM-i 


1 


CCTCACTGCCAGATC 


9 7fi-QQ n 


la 


GGGAATTCCTCACTGCCAGATC 


-7 f \J J Z7 \J 


IN 


CCTCACTGCCAGATCGCCT 


Q7 A— QQ 1 


2 


TACATCTGCCTGTGTTAC 


1 T ft ft — 1 a n r 


2N 


CCTACATCTGCCTGTGTTAC 


1386-1405 


2a 


CCGCCATACGTGCTG 


1554-1568 


3 


ATCTGTGGCATACCT 


1640-1626 


3a 


GGGAATTCATCTGTGGCATACCT 


1640-1626 


3N 


ATCTGTGGCATACCT AAAGG 


1640-1621 


4 


GAATCGCTTGGCTCG 


1661-1647 


5L 


CCAGATCGCCTTTAAGAAGG 


984-1003 


3L 


TACAGGTAGCAGCACGTATG 


1558-1577 



20 



25 



30 
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6.2.2. Detection of MMTV-Like Env Gene 

Sequences in Human Breast Tumor r>N& 
PCR was performed on DNA extracted from breast 
cancer tissues, normal breast tissues and from the 
plasmid containing the env gene of MMTV, using primers 
1 and 3. Photographs of the ethidium bromide stained 
gels of the PCR product reveal the presence of an 
approximately 660 bp sequence in some of the tumors, 
(Fig. 1A, lanes 1 and 3) but not in the normal tissue 
samples (Fig. ia, lanes 2 and 4). As a positive con- 
trol the MMTV env gene was also amplified (Fig. ia, 
lane E) . similar results were obtained with modified 
primers la, 3a, 3L and 5L. Southern blot hybridization 
of the gel with 32 P-labeled 18-mer oligonucleotide 
(primer 2) indicated that this internal sequence was 
present in the amplified material (Fig. IB) and that 
the bands in the gel were not artif actual. 

Our initial effort was to analyze a representative 
sample of breast cancer specimens as well as normal 
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tissues and other tumors. To date 34 3 breast tumors 
have been processed, DNA extracted and PCR preformed. 
Of these 343 tumors, 314 were carcinomas and 29 were 
fibroadenomas. Amplification of sequences of 660 bp 
5 was observed in 121 of the carcinomas (38.5%) and in 
2 of the 29 fibroadenomas (6.9%) . These sequences 
were confirmed to be MMTV env gene-like sequences by 
hybridization with the labeled specific probe 
containing the internal sequences. These sequences 

10 were not detected in the DNAs extracted from 2 0 normal 
organs, 2 3 cancers from other organs and 2 6 samples of 
blood lymphocytes including 7 from breast cancer 
patients whose breast specimens were positive. From 
107 samples of normal breast obtained from reduction 

15 mammoplasties , 2 were positive (1.8%). In addition to 
DNA from lymphocytes from seven positive patients, DNA 
from their normal breast tissue of the operated breast 
was tested in 4 cases. All were negative (Table 2). 
Finally, DNA of the MCF-7 , and ED (a cell line 

20 developed in our laboratory from the pleural effusion 

of a patient with an env -positive breast tumor) breast 
cancer cell lines were shown to contain the 660 bp MMTV 
env gene-like sequences (Table 3), while four other 
breast cancer cell lines were positive only for the 

25 250 bp sequence (T47-D, BT-474, BT-20 and MDA-MB-231). 
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Tabl 2. Detection of MMTV env gene-like 
s quenoes in human DNA extract 4 
from fresh or frozen tissues 



Sample 



MMTV env gene 
Number sequences % Positive 



10 



15 



20 



Breast Carcinomas 314 

Breast Fibroadenomas 29 

Normal Breasts 107 

♦Normal Breasts 4 

Tumors other than 

breast 2 3 

Normal tissues 2 0 

Lymphocytes 2 6 

**Lymphocytes 7 



121 

2 
2 

negative 

negative 
negative 
negative 
negative 



38.5% 
6.9% 
1.8% 



* Histologically normal tissue from same breast 
as positive cancer. 

** Lymphocytes from breast cancer patients who were 

positive for MMTV env gene sequences in the tumor, 
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Table 3. Det ction of MM TV env gen -like s guences 
in DNA from human c 11 lines in cultur 



Human Cell Lines 



5 MC-7 
T47-D 
BT-2 0 

MDA-MB-2 31 
ZR-75-1 
10 SK-BR 3 

BT474 
ED 

MCF-10 
HB-447 
15 HL-60 
K562 
Jurkat 
Hep 6-2 



(breast carcinoma) 
fi 



(normal breast) 
it •• 

(promyelocytic leukemia) 
( erythroleukemia ) 
(T cell leukemia) 
(hepatoma) 



MMTV env gene sequence 

positive 
negative 
negative 
negative 
negative 
negative 
negative 
positive 
negative 
negative 
negative 
negative 
negative 
negative 



The nested polymerase reaction was used in several 

20 instances to increase sensitivity and specificity, thus 
reducing the probability of false positives. In 
Fig. 2, results of a representative nested reaction 
are shown using primers 1 and 4 in the first reaction 
(Fig. 2A) and 2 and 3 for the 2nd reaction. The 

25 specificity of the reaction can be seen in the 2nd 
amplification (Fig. 2B) . 

To study a large number of samples and to be able 
to perform archival studies, PCR of paraffin-embedded 
tissue sections was also carried out. Primers 2 and 3 

30 were used to amplify a 250 bp sequence within the 

660 bp stretch when DNA was extracted from paraffin- 
embedded tissue sections since larger size sequences 
are difficult to amplify after fixation- Tumor DNA was 
amplified (Fig. 3A, lanes 2-5) whereas normal breast 

35 DNA was not (Fig. 3A, lane 1) . The identification of 
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this 2 50 bp sequence with the MMTV-like env gene was 
confirmed by hybridization with an internal probe 
(primer 2a) as shown in Fig. 3B. Using this procedure 
we have analyzed 151 breast cancer samples and found 
5 that 60 (39,7%) possess the 250 bp sequence. Of the 
27 normal breast samples obtained from reduction 
mammoplasties assayed by this procedure, one was 
positive (3.7%). These results, in conjunction with 
those obtained from lymphocytes and from normal breast 
10 tissue of patients whose breast cancer was PCR 

positive, indicate that MMTV-like sequences are present 
in a significant number of human breast cancer DNA 
which cannot be explained by DNA polymorphism. 

6.2.3. Cloning and Sequencing of the 
15 MMTV-Like Env Gene Sequences 

To find out whether there was homology to MMTV env 

gene throughout the whole 660 bp stretch, the product 

of the PCR from 8 different tumors was cloned and 

sequenced. In Fig. 4 the sequence of different clones 

20 comprising around 600 bp are represented, as aligned to 
the MMTV env gene sequence of the GR and BR6 strains 
(Redmon, S. and Dickson, C. , 1983, EMBO J. 2:125-131). 
This domain of the env gene in the GR strain is 100% 
homologous to the C 3 H strain and 98% to the BR6 strain 

25 (Majors, I.E. and Varmus, H . E. , 1983, J. Virol. 47:495- 
504; Moore, R. et al., 1987, J. Virol. 61:480-490). 
Evaluation of the clones indicated that homology to 
MMTV env gene varied from 95% to 99%. Another seven 
clones comprising only 250 bp were also sequenced. 

30 Homology to MMTV env gene varied from 95% to 99% (data 
not shown) . When compared to the human endogenous 
provirus HERV-K10, the homology of all the clones was 
less than 15%. When compared against all known viral 
and human genes (more than 130,000 entries) using the 

3 5 1B1 MacVector GenBank and GCG programs, the highest 
homology recorded was 18%. 
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6,2.4. Southern Blot Analysis 
Using Cloned Sequences 

To investigate whether the env gene-like sequences 

were present in human DNA , Southern blot hybridization 

5 was performed using the cloned sequence as probe ♦ DNAs 

from normal breast tissues, env positive or negative 

breast tumors, tumors other than breast and breast 

cancer cell lines were restricted with EcoRI and in 

some instances with Pstl, Bglll or Kpnl. EcoRI is a 

10 frequent cutter restriction enzyme that digests MMTV 

proviral DNA between env and pol genes. Four different 
cloned 660 bp sequences were used as probes after 
labeling with 32 P by random prime-labeling. Results of 
some of the Southern blot hybridization experiments are 

15 shown in Fig. 5. They reveal the presence of a labeled 
restriction fragment migrating at approximately 7-8 kb 
in breast cancer DNA, in ED and two fragments in MCF-7 
cells. Different restriction patterns were observed 
with the other three enzymes. The 660 bp sequence 

20 was absent in 10 normal tissues, 10 fibroadenomas and 
10 tumors from other tissues. It is important to 
emphasize that hybridization conditions for these 
experiments were stringent (as described in Section 
6.1) to avoid interference with endogenous sequences 

25 that might interact with the probes. 

7. Example: Detection of a Retrovirus 
Proviral Fragment in Human 
Breast Cancer Cells and Tissues 

7.1. Materials and Methods 

30 To detect longer retrovirus proviral fragments in 

breast cancer samples, DNA was extracted from breast 

cancer carcinoma tissue samples as described above in 

Section 6.1. Two rounds of long PCR was performed on 

the DNA primers 5L (SEQ ID NO: 11) and LTR3 (SEQ ID 

35 NO: 19). The primer 5L contains nucleotides 7370-7395 

of the MMTV BR6 genome (5 '-3': CCAG ATCGCCTTTAAGAAGG ) 

(SEQ ID NO: 11) and primer LTR3 contains nucleotides 
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9918-9927 of the MMTV BR6 genome (5' -3': 
CGAAC AGACACAAAGCGACG ) (SEQ ID NO: 19). Long PGR was 
performed using protocols described by the manufacturer 
(Perkin Elmer, Foster City, CA) . The amplified 
5 retroviral fragment isolated from the breast cancer 
sample was cloned into the TA cloning vector 
(Invitrogen) and automated sequencing was performed 
as described in Section 6.1. 

7 . 2 Results 

10 An approximately 2.6 kb retroviral fragment 

containing 1,228 bp of the 3' LTR sequence and 1,336 bp 
of the env gene sequence of a potential provirus was 
detected in a human breast carcinoma tissue sample by 
the long PCR technique using the 5L and LTR3 primers. 

15 The sequence of this retroviral fragment is shown in 
Fig. 9. (SEQ ID NO:20). 

When compared with the two strains of MMTV C3H 
and BR6 , the sequence homology was 90.8% and 90.7%, 
respectively, over the MMTV genomic fragment from 

20 nucleotides 7370-9937. When compared with the 

endogenous retroviral sequences (HUMERKA) , sequence 
homology was only 58% in 36 bp and 71% in 74 bp. 

8 . Discussion 
Search for virus-related sequences in human breast 

2 5 cancer has been hampered by great variation reported 

in previous studies, by the presence of endogenous 
retroviral sequences in human DNA and by the lack of 
sensitivity of the methods employed. The studies 
reported herein circumvent these deficiencies by 
30 focusing on sequences with low homology to human 
endogenous retroviruses, by investigating a large 
number of tumors and several types of controls and 
by using the most sensitive technology presently 
available . 

3 5 The results indicate that unique MMTV env gene 

sequences were present in 38.5% of the breast cancer 



BNSDOCID: <WO 971 7470A1 > 



WO 97/17470 



25 



PCT/US96/17877 



samples analyzed and 39.7% of archival samples of 
breast cancer and that these sequences were absent in 
normal tissues including lymphocytes from patients with 
positive breast cancer and in cancers other than 
5 breast. Normal breast tissue and fibroadenomas had a 
low frequency (1.8 to 6.9%) of positive results. When 
cloned and sequenced, the sequences were found to be 
highly homologous to MMTV env gene, but not to the 
endogenous retroviral sequences. Furthermore, 

10 experiments in which the cloned amplified sequences 

were used for hybridization with DNA from breast cancer 
or normal tissues revealed that homologous DNA was only 
present in breast cancer DNA. The results also 
indicate that a human breast carcinoma sample contained 

15 an about 2.6 kb MMTV-like fragment comprised of 1,336 
bp of the env gene and 1,2 28 bp of the 3' LTR. 

The detection of MMTV env gene sequences in two 
fibroadenomas out of 29 and in two normal breast tissue 
samples out of 107 samples is of uncertain signifi- 

20 cance. Although such results could potentially be 

artif actual, and thus may represent false positives, 
they may alternatively indicate the presence of 
histologically unrecognized cells that were or will be 
neoplastic. 

Ninety percent (90%) of the breast cancers tested 
were invasive ductal carcinomas, which reflects the 
prevalence of this type of neoplasm. Most patients 
were node-positive which is probably artifactual since 
it was necessary that tumor size be sufficiently large 
to provide an aliquot for research and tumor size 
correlates with node positivity. 

It is unlikely that differences in homology 
between MMTV env gene and the cloned human sequences 
are generated by errors committed by the Taq 
3 5 polymerase. it has been estimated that the rate of 
nucleotide misincorporation is l x 10" 5 per cycle 
(Ehrlich et al, 1991, Science 252:1643-1651) and 
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therefore, only a total of 0.32 nucleotides 
misincorporated should be expected in 660 bp after 
50 cycles. The differences in homology between clones 
from different patients is likely to represent 
5 heterogeneity of the env gene. 

in contrast to earlier, ambiguous data associating 
MMTV-like sequences with human breast cancer, we have 
clearly demonstrated the existence of such sequences in 
breast cancer cells which cannot be explained by any 
known human endogenous retroviral sequence. Our data do 
not support the results of earlier studies which 
indicated that, as in the mouse, MMTV-like sequences 
were found in lymphocytes from two patients with breast 
cancer (Crepin, M . et al., 1984 , Biochem. Biophys. Res. 
15 Comm. 118:324-331). The absence of MMTV env-like 

sequences in lymphocytes could reflect the fate of a 
unique lymphocyte subset over decades between initial 
encounter and the appearance of clinical breast cancer • 
alternatively, the human disease may differ from the ' 
20 mouse model. Results from attempts to identify unique 
MMTV-like Eol gene sequences have shown that they 
cannot be distinguished from the reverse transcriptase 
sequences of endogenous retroviruses (Deen, K.c. and 
Sweet, R.W., 1986, J. Virol. 57:422-432). 
25 The origin of the MMTV env gene-like and 3' LTR- 

like sequences found in tumor DNA could be the result 
of integrated MMTV-like sequences from a human mammary 
tumor virus. Polymorphism of endogenous retroviral 
sequences is conceivable but can be ruled out because 
10 these sequences were not detected in lymphocytes from 
the positive patients, in sections of the cancerous 
breast from which abnormal cells were absent or in 
normal breast tissue from patients with MMTv'env-like 
positive tumors. Recombination during tumorigenesis 
5 between endogenous sequences to resemble the MMTV env 
genes seems highly unlikely since no known gene or 
viral sequence is more than 18% homologous to the 
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660 bp sequence- The longer about 2.6 kb MMTV-like 
fragment detected in a human breast carcinoma had 
minimal homology (58% in 36 bp nd 71% in 74 bp) to 
endogenous human retroviral sequences. Thus, the most 
5 conservative interpretation is that our findings repre- 
sent exogenous sequences from an agent similar to MMTV. 
Recombination between endogenous and exogenous env gene 
sequences are known to accelerate the development of 
malignancies in mice (DiFronzo, N . L. and Holland, C.A W 

10 1993, J ♦ Virol. 67:3763-3770). Whether the MMTV-like 
sequences belong to an entire acquired provirus or to 
an exogenous fragment integrated into endogenous 
sequences, is presently not known. Experiments are in 
progress to distinguish between these possibilities. 

15 Several genetic alterations have been identified 

in human breast cancer that can be useful as markers 
for prevention, detection or prognosis (reviewed in 
Runnenbaum, I. et al. , 1991, Proc. Natl. Acad. Sci. USA 
88:10657-10661). The BRCA1 and BRCA2 genes have 

2 0 recently been described. They account for at least 

5% of breast cancer and are related to familial breast 
cancer (Miki, Y. et al., 1994, Science 266:66-71; 
Wooster, R. et al., 1994, Science 265:2088-2090). We 
have primary evidence that familial clustering of the 
25 MMTV env gene-like sequences occurs, accounting for an 
even higher percentage of cancers in affected families 
(Holland et al. 1994, Proc. Am. Assoc. Cancer Res 
35:218). The presence of MMTV-like sequences may be 
correlated with special clinical disease status, may 

3 0 provide another potential molecular marker, and may 

distinguish a subset of human breast cancer for which 
viral etiology is tenable. This has implications for 
epidemiology, therapy and prevention. 

Various publications are cited herein, the 
3 5 contents of which are hereby incorporated by reference 
in their entireties. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION 
( i ) APPLICANT : HOLLAND , JAMES 



(ii) TITLE OF THE INVENTION: DETECTION OF MAMMARY TUMOR VIRUS-LIKE 

SEQUENCES IN HUMAN BREAST CANCER 

/ (iii) NUMBER OF SEQUENCES: 20 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Brumbaugh, Graves, Donohue & Raymond 

(B) STREET: 30 Rockefeller Plaza 
<C) CITY: New York 

( D ) STATE : NY 

(E) COUNTRY: USA 

(F) ZIP: 10112-0228 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version 1.5 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: NOT YET ASSIGNED 

(B) FILING DATE: OQ-NOV-1996 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER 08/555,394 

(B) FILING DATE: 09-NOV-1995 



(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Kole, Lisa B 

(B) REGISTRATION NUMBER: 35,225 

(C) REFERENCE/DOCKET NUMBER: 30363-PCT - 165/ 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 212-408-2628 

(B) TELEFAX: 212-765-2519 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



CCTCACTGCC AGATC 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
<iv) ANTISENSE: NO 

10 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
GGGAATTCCT CACTGCCAGA TC 



15 (2) INFORMATION FOR SEQ ID NO: 3: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
2 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

2 5 (vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCTCACTGCC AGATCGCCT 



3 0 (2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

4 0 (vi) ORIGINAL SOURCE: 

( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TACATCTGCC TGTGTTAC 



18 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<ii) MOLECULE TYPE: cDNA 
<iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCTACATCTG CCTGTGTTAC 



(2) INFORMATION FOR SEQ ID NO: 6: 

10 U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: cDNA 

( iii) HYPOTHETICAL: NO 

( i v ) ANTISENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
20 ( ix ) FEATURE : 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6; 
CCGCCATACG TGCTG 



(2) INFORMATION FOR SEQ ID NO : 7 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



15 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

( iv ) ANTISENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ATCTGTGGCA TACCT 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGGAATTCAT CTGTGG CAT A CCT 2 3 

(2) INFORMATION FOR SEQ ID NO: 9: 

<i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: cDNA 
10 (iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
{ ix ) FEATURE : 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATCTGTGGCA TACCTAAAGG 20 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
( D ) TOPOLOGY : linear 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix) FEATURE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
3 0 GAATCGCTTG GCTCG 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
( ix ) FEATURE : 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCAGATCGCC TTTAAGAAGG 
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(2) INFORMATION FOR SEQ ID NO: 12; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairB 

(B) TYPE: nucleic acid 

^ (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
10 (v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 
(ix) FEATURE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TACAGGTAGC AGCACGTATG 



15 



30 



35 



40 



(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

on < c ) STRANDEDNESS: single 

^ U (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

0 c- < v > FRAGMENT TYPE: N-terminal 

" (vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13; 

Git 
10 



Leu Lyg Arg Pro Gly Phe Gin Glu His Glu Met He 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Gly Leu Pro His Leu He Asp He Glu Lys Arg Gly 
5 10 



20 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terrninal 
5 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Thr Asn Cys Leu Asp Ser Ser Ala Tyr Asp Thr Ala 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 16: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Asp lie Gly Asp Glu Pro Trp Phe Asp Asp 
15 10 



(2) INFORMATION FOR SEQ ID NO: 17: 



25 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 662 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



30 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



35 



40 



45 



TCCTCACTGC 
TCCACGGTGG 
CCTTGGGTTA 
TAAAAGCAAA 
ATGGTATGAA 
CAATGATAGA 
TCAAGATATC 
TGTGTTACTT 
GAGG ATCTAC 
CTGCCTACGA 
ATATTGGTGA 
AT 



CAGATCGCCT 
TTGCCTTGCG 
CTTTGGGATT 
AAGGATCTAT 
GCAGGATGGG 
GATTTTACTG 
TTATTCTCAA 
ACCCTTATGT 
TTTTCATATT 
CTATGCAGCG 
TGAACCATGG 



TTAAGAAGGA 
CCTTCCCTGA 
TCTCCCTTCC 
TTGGAAATTA 
TAGAACCTAC 
CTCTAGTTCC 
AAGGCAGGAT 
CATATTATTA 
TCCTGTTCTT 
ATCATAGTCA 
TTTGATGATT 



CGCCTTCTGG 
CCAAGGGGTG 
CTCGCCTAGT 
TACTCCCCCA 
ATGGTTCTGG 
CATACAGAAT 
TTCAGGAACA 
GGATTACCTC 
CTTGTAGATT 
AGAGGCCGCC 
CTGCCATTCA 



GAGGGAGACG 
AGTTTTTCTC 
GTAGATCAGT 
GTCAATAAAG 
GAAAATTCTC 
TGTTTCGCTT 
TGAGATGATT 
AGCTAATAGA 
GACTAATTGT 
ATACGTGCTG 
AACCTTTAGG 



AGTCTGCTCC 
CAAAAGGGGC 
CAGATCAGAT 
AGGTTCATCG 
CTAAGGATCC 
AGTTGCAGCC 
CCTACATCTC 
TATAGAGAAA 
TTAGATTCTT 
CTACCTGTAG 
TATGCCACAG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
662 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 663 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: lin ar 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 19: 
CGAACAGACA CAAACACACG 



(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2598 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 




AGTCTGCTCC 

CAAAAGGGGC 

CAGATCAGAT 

AGGTTCATCG 

CTAAGGATCC 

AGTTGCAGCC 

CCCTACATCT 

ATATAGAGGA 

TTTAGATTCT 

GCTACCTGTA 

GTATNCCACA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
663 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



CGAACAGACA 
CAGCACTCTT 
AAGAACATAG 
5 TCAGGAAACC 
TAAACCTTGG 
ATTATCTGCA 
AAATCTTTCC 
TTTATGTCGT 

10 CTTTCACGAA 
TCGAAATTTA 
TAAGTCCCGG 
TTACTTCTAG 
TCTTTCTATT 

1 5 ACAAAG ATTC 

TAAAAATATA 
GCTTGTAANA 
TGTTTAGATT 
GAGCACGAAC 

2 0 AGCAGANAAT 
CCTTGGTGGG 
TCAACCATTT 
TTAAGATCTG 
ATCATAAGTA 

2 5 CTAATCCAAT 

TCCCTGCGTT 
GAAACTCATG 
CATTATACGG 
ATTCTGGTCT 

3 0 TTAAGTCTAG 

TTTCTATGAA 
GCTACAGCAA 
ACGAATCGCT 
TCAAACCATG 

3 5 ATGAATCGCT 

GGATTTGAAA 
GTATCTANAA 
AAATTCNGGG 
TGGGGAATAG 

4 0 CCAAGTAGGT 

GAGTATAATT 
GCGGGGGAAG 
CCTGGTCAGG 
AGGCGTCCTT 
4 5 TTCTTAAAGG 



CAAACACACG 
TTATATCATG 
GAGAATAGAA 
ACTTGTCTCA 
GAACCGCAAN 
GAAATGTGTT 
CCCAACGTTC 
CTTTTTCTTC 
AGGGGAGGGA 
AATCGTATCT 
TTGCCACCAC 
GCCTGAGGCC 
TTCTATTCCC 
ATTTCTTAAC 
ATTTTTAGCA 
GGAANTTGGC 
GTAATCTTGC 
CGCAACTTCC 
GAGTATGTCT 
AAACAACCCC 
CTGCTGCAGG 
ACTGCACTTG 
CTATGACCAA 
GGATTTAAAG 
CAAATTTTTT 
TCTTCAAATG 
NANAGGTGTG 
TTAAGTTTGC 
CTTTCAATTT 
GATTATTAAC 
AGGAAGTGAT 
TAGCTCGAAT 
GTTCATCACC 
GCATATCCGT 
NTTATNCCCC 
NAGGGCATAG 
TTTGGGAGAA 
AGCAGTAAAA 
TCNAACCCAT 
TCCAAATAGA 
GGAGAAATCC 
GAAGGCGCAA 
CTTAAAGGCG 
CGATCTGG 



AGAGGTGAAT 
GTTTACATAA 
CACTCAGAGC 
CATCCTTGTT 
GTTGGGCTCA 
CCTAATTGTC 
ATCCCACTCC 
CTGAGTTAAC 
TCTGTACAAC 
TCCTGTATAT 
CTGTCTCCTA 
CTTAGTCCTT 
ATTTCTAACC 
ATCATGATTA 
AGCATTCTTA 
TGTGGTCCTT 
ACAGAAGAGT 
CCCAATAGCC 
TTGTCTGATG 
TTGGCTGCTT 
CGCGGCATTT 
GTCAAGGCTC 
AAGCAGGGCT 
CCTTTACTCC 
TGCTCNTATC 
CCCAATAAAT 
ACACAGCATA 
CACATCTTGT 
TAAGTCTATT 
AAACGTAGCA 
AATAGCAATT 
TAAATCTGTG 
AATATCTACA 
NGGCAAAAAA 
TTNCCCCNAA 
GGGTAAGAAA 
TAAGATTCTG 
TCTCTATCAT 
CNTGCTTCAT 
TCCTTTTTGT 
CAAAGTAACC 
GGCAACCACC 
ATCTGGAGGA 



GTTAGGACTG 
GCATTTACAT 
TTAGATCAAA 
TTAAGAACAG 
TAAAGGTTAT 
TAGCCACTGC 
CCTAGATAAA 
ACACACCAAG 
ACTTTATAGC 
GGTAGCGG GG 
TTTTGACAAG 
GCACCTGTTC 
TTTGAATTTG 
AT AATCG AC C 
TTTCTATTTC 
GCCCCACGAG 
TATTAAAAGA 
CCAGGCAAAG 
GGCTCATCCG 
CTCTCCTAAG 
CCCCCTTTTT 
TTCGCAAAGC 
CCAACTCCTA 
ATTGGCNAAG 
CTAATCCAAT 
GAGCCCTGGT 
AAATCATAAT 
CCCAACTCTA 
ATTCTTTGTT 
GTTTGCATCT 
AAAGCAGATA 
GCATACCTAA 
GGTTACAACA 
TCTAACCATT 
CCCANACCGA 
AACGGCAGAG 
GAGGCTGCAA 
GGGGATCTTT 
ACCATCGATG 
TTTTAATCTG 
CAAGGGCCCC 
GTGGAGGAGC 
GCAGACTCGT 



TTGCAAGTTT 
AAGACTTGGA 
ACATTTGATA 
TTTGTGACCC 
CCATTATAGC 
CCCCTCCCTT 
TATAATCATG 
GAGGTCTAGC 
CGTTGACTGT 
CGTCTGTTGG 
CGTACTCCTC 
TTCAACTGAG 
AGTAAATATA 
TATTGGATTG 
TGAAGGACAA 
GAAGGTCGAG 
ATCAAGGGTG 
CAGAGCTATG 
CGTGCACGCA 
TGTAGGACAC 
TCTTTTTTAA 
ACTGGAAAAT 
TAAAAATGAA 
GANTGANCCA 
TGGTAACCCC 
TCTTTCCCAG 
TTGCATGACA 
AAACTACTTC 
CAGATNAGGC 
CCTTAACTAA 
TGCCCAGAAT 
AGGTTTGAAT 
CATATGGCGG 
ATTCCTCCTN 
GGTACCCCAT 
NGGGATCNTT 
ATTAAGGGAA 
AGGGAGAATT 
AACNTCTTTA 
ATCTGACTGA 
TTTTGGAGAA 
AGACTCGTCT 
CTCCCTCCCA 



ACTCAAAAAA 
TAAGTTCCAA 
CCAAACCAAG 
TGAACTTACT 
TCATGCCAAA 
GGTATAATGA 
TACCTGTTGT 
TCTGGCGAGT 
GACCCACCTA 
TCTGTAGATG 
TTTCCCCTTT 
GTTGAGCGTC 
GTGCTAAAAG 
G TCTTATTGG 
AGTCGGTGTG 
TTCTCCGAAT 
AGAGCCCTGC 
CCAAGTTTGC 
GACGGGTCGT 
TCTCGGGAGT 
AAGAAGCACG 
AACGGGGAAA 
ATATTGTGTT 
ACCCCTGAGG 
GTTTNTTTTT 
CTCTCAGAAG 
CCTAGTGGAC 
TTCTAAAGCA 
TAATGTAACA 
GGCAGTAGTA 
AATGGCAGCG 
GGCAGAATCA 
CCCCTTGAAT 
CCNAAAAACG 
AATGNGGGGG 
TATGTTCNGG 
ACATTNTGTA 
TTCCCAGGAA 
TTGACAGGGG 
TCTACACTAG 
AAACTCACCC 
CCCTCCCAGA 
GAAGGCGTCC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2598 
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Claims 



1 l. A composition comprising an oligonucleotide primer 

2 which may be used to detect the presence of a 

3 nucleic acid molecule which (i) hybridizes to the 

4 env gene of a mouse mammary tumor virus; (ii) is 

5 present in at least 3 8 percent of DNA samples 

6 prepared from breast cancer tissue of different 

7 human subjects; and (iii) hybridizes to less than 

8 7 percent of DNA samples prepared from tissues 

9 other than breast cancer tissue from different 
10 human subjects. 

1 2. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTCACTGCCAGATC (SEQ ID NO : 1 ) . 

1 3 . The composition of claim 1 , wherein the 

2 oligonucleotide primer comprises the sequence 

3 GGG AATTCCTCACTGCCAGATC (SEQ ID NO : 2 ) . 

1 4. The composition of claim l, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTCACTGCCAGATCGCCT (SEQ ID NO : 3 ) . 

1 5. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 TACATCTGCCTGTGTTAC (SEQ ID NO: 4), 

1 6 . The composition of claim 1 , wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCTACATCTGCCTGTGTTAC (SEQ ID NO: 5). 

1 7. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCGCCATACGTGCTG (SEQ ID NO: 6). 
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1 8. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 ATCTGTGGCATACCT (SEQ ID NO : 7 ) . 

1 9. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 GGGAATTCATCTGTGGCATACCT (SEQ ID NO : 8 ) „ 

1 10. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises a sequence 

3 selected from the group consisting of 

4 ATCTGTGGCATACCT AAAGG (SEQ ID NO : 9 ) ; 

5 GAATCGCTTGGCTCG (SEQ ID NO: 10); 

6 CCAGATCGCCTTTAAGAAGG (SEQ ID NO : 1 1 ) ; and 

7 TACAGGTAGCAGCACGTATG (SEQ ID NO: 12). 

1 11. An essentially purified peptide encoded by a 

2 nucleic acid molecule which (i) hybridizes to 

3 a gene of MMTV; (ii) is present in at least 

4 20 percent of DNA samples prepared from breast 

5 cancer tissue of different human subjects; and 

6 (iii) is present in less than 5 percent of DNA 

7 samples prepared from tissues other than breast - 

8 cancer tissue from different human subjects. 

1 12. An antibody which specifically binds to the 

2 peptide of claim 11. 

1 13. The peptide according to claim 11 which comprises 

2 the amino acid sequence LKRPG FQ EH EM I (SEQ ID 

3 NO:13). 

1 14 . An antibody which specifically binds to the 

2 peptide of claim 13. 

1 15. The peptide according to claim 11 which comprises 

2 the amino acid sequence GLPHLIDIEKRG (SEQ ID NO: 14). 
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1 16. A method of diagnosing breast cancer in a human 

2 subject, comprising detecting the presence of a 

3 peptide encoded by a nucleic acid molecule which 

4 (i) hybridizes to the env gene of 3' LTR of a 

5 mouse mammary tumor virus; (ii) is present in at 

6 least 2 0 percent of DNA samples prepared from 

7 breast cancer tissue of different human subjects; 

8 and (iii) is present in less than 5 percent of DNA 

9 samples prepared from tissues other than breast 
10 cancer tissue from different human subjects. 

1 17. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence LKRPGFQEHEMI (SEQ ID NO: 13) is detected 

4 by the binding of an antibody specific to the 

5 peptide, 

1 18. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence GLPHLIDIEKRG (SEQ ID NO: 14) is detected 

4 by the binding of an antibody specific to the 

5 peptide. 

1 19. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence TNCLDSSAYDTA (SEQ ID NO: 15) is detected 

4 by the binding of an antibody specific to the 

5 peptide. 

1 20. The method according to claim 16, wherein the 

2 presence of a peptide comprising the amino acid 

3 sequence DIGDEPWFDD (SEQ ID NO: 16) is detected by 

4 the binding of an antibody specific to the 

5 peptide. 

1 21. A composition comprising an oligonucleotide primer 

2 which may be used to detect the presence of a 



BNSOOCID: <WO 971 7470A1 > 



WO 97/17470 



39 



PCT/US96/17877 



3 nucleic acid molecule which (i) hybridizes to a 

4 nucleic acid comprised of a sequence selected from 

5 the group consisting of the env gene and the 3' 

6 LTR of a mouse mammary tumor virus; (ii) is 

7 present in a substantial percentage of DNA samples 

8 prepared from breast cancer tissue of different 

9 human subjects; and (iii) hybridizes to less than 

10 5 percent of DNA samples prepared from tissues 

11 other than breast cancer tissue from different 

12 human subjects. 

1 22. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CCAGATCGCCTTTAAGAAGG (SEQ ID NO: 11). 

1 23. The composition of claim 1, wherein the 

2 oligonucleotide primer comprises the sequence 

3 CGAACAGACACAAACACACG (SEQ ID NO: 19). 
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CGAACAGACACAAACACACGAGAGGTGAATGTTAGGACTGTTGCAAGTTTA 

CTCAAAAMCAGCACTCTTTTATATCATGGTTTACATAAGCATTTACATAAGA 

CTTGGATAAGTTCCAAAAGAACATAGGAGAATAGAACACTCAGAGCTTAGAT 

CAAAACATTTGATACCAAACCAAGTCAGGAAACCACTTGTCTCACATCCTTG 

TTTTAAGAACAGTTTGTGACCCTGAACTTACTTAAACCTTGGGAACCGCAAN 

GTTGGGCTCATAAAGGTTATCCATTATAGCTCATGCCAAAATTATCTGCAGA 

AATGTGTTCCTAATTGTCTAGCCACTGCCCCCTCCCTTGGTATAATGAAAAT 

CTTTCCCCCAACGT TCATC CCACTCCCCTAGATAAATATAATCATGTACCTGT 

TGTTTTATGTCGTCTTTTTCTTCCTGAGTTAACACACACCAAGGAGGTCTAGC 

TCTGGCGAGTCTTTCACGAAAGGGGAGGGATCTGTACAACACTTTATAGCC 

GTTGACTGTGACCCACCTATCGAAATTTAAATCGTATCTTCCTGTATATGGTA 

GCGGGGCGTCTGTTGGTCTGTAGATGTAAGTCCCGGTTGCCACCACCTGTC 

TCCTATTTrGACAAGCGTACTCCTCTTTCCCCTTTTTAGTTCTAGGCCTGAGG 

CCCTTAGTCCTTGCACCTGTTCTTCAACTGAGGTTGAGCGTCTCTTTCTATTT 

TCTATTCCCATTTCTAACCTTTGAATTTGAGTAAATATAGTGCTAAAAGACAA 

AGATTCATTTCTT AACAT CATGATTAATAATCGACCTATTGGATTGGTCTTATT 

GGTAAAMTATMTTTTTAGCAAGCATTCTTATTTCTATTTCTGAAGGACAAA 

GTCGGTGTGGCTTGTAANAGGAANTTGGCTGTGGTCCTTGCCCCACGAGGA 

AGGTCGAGTTCTCCGAATTGTTTAGATTGTMTCTTGCACAGAAGAGTTATTA 

AAAGAATCAAGGGTGAGAGCCCTGCGAGCACGAACCGCAACTTCCCCCAAT 

AGCCCCAGGCAAAGCAGAGCTATGCCAAGTTTGCAGCAGANAATGAGTATG 

TCTTTGTCTGATGGGCTCATCCGCGTGCACGCAGACGGGTCGTCCTTGGTG 

GGAAACAACCCCTTGGCTGCTTCTCTCCTAAGTGTAGGACACTCTCGGGAG 

TTCMCCATTTCTGCTGCAGGCGCGGCATTTCCCCCTTT7TTCTTTTTTAAAA 

GAAGCACGTTAAGATCTGACTGCACTTGGTCAAGGCTCTTCGCAAAGCACT 

GGAAAATAACGGGGAAAATCATAAGTACTATGACCAAAAGCAGGGCTCCAA 

CTCCTATAAAAATGAAATATTGTGTTCTAATCCAATGGATTTAAAGCCTTTAC 

TCCATTGGCNAAGGANTGANCCAACCCCTGAGGTCCCTGCGTTCAAATTTTT 

TTGCTCNTATCCTMTCCMTTGGTAACCCCGTTTNTTTTTGAAACTCATGTC 

TTCAAATGCCCAATAAATGAGCCCTGGTTCTTTCCCAGCTCTCAGAAGCATT 

A TAC GGNANAGG TG TGAC ACAGC A 7 AAAA TC A TAA TTTOC A TG AC ACC TAG T 

GGAC ATTC TGGTCTTTAAG TTTGCC. \C ATC TTG TCC C AAC TC TAAAAC TAC T T 

CTTCTAAAGCATTAAGTCTAGCTTTCAATTTTAAGTCTATTATTCTTTGTTCAG 

ATNAGGCTAATGTAACATTTCTATGAAGATTATTAACAAACGTAGCAGTTTGC 

ATCTCCTTAACTAAGGCAGTAGTAGCTACAGCAAAGGAAGTGATAATAGCAA 

TTAAAGCAGATATGCCCAGAATAATGGCAGCGACGAATCGCTTAGCTCGAAT 

TAAATCTGTGGCATACCTAAAGGTTTGAATGGCAGAATCATCAAACCATGGT 

TCATCACCAATATCTACAGGTTACAACACATATGGCGGCCCCTTGAATATGA 

ATCGCTGCATATCCGTNGGCAAAAAATCTAACCATTATTCCTCCTNCCNAAA 

AACGGGATTTGAAANTTATNCCCCTTNCCCCNAACCCANACCGAGGTACCC 

CATAATGNGGGGGGTATCTANAANAGGGCATAGGGGTAAGAAAAACGGCA 

GAGNGGGATCNTTTATGTTCNGGAMTTCNGGGTTTGGGAGAATAAGATTCT 

GGAGGCTGCAAATTAAGGGAAACATTNTGTATGGGGAATAGAGCAGTAAAA 

TCTCTATCATGGGGATCTTTAGGGAGAATTTTCCCAGGAACCAAGTAGGTTC 

NAACCCATCNTGCTTCATACCATCGATGAACNTCTTTATTGACAGGGGGAGT 

ATAATTTC C AMTAGATCCTTTTTGTTTTTAATCTG ATCTGAC TG ATCTAC AC T 

AGGCGGGGGAAGGGAGAAATCCCAAAGTMCCCAAGGGCCCCTTTTGGAG 

AAAAACTCACCCCCTGGTCAGGGAAGGCGCAAGGCAACCACCGTGGAGGA 

GCAGACTCGTCTCCCTCCCAGAAGGCGTCCTTCTTAAAGGCGATCTGGAGG 

AGCAGACTCGTCTCCCTCCCAGAAGGCGTCCTTCTTAAAGGCGATCTGG p|Q g 
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